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FOREWORD 


The  work  reported  in  this  paper  was  done  in  the  Applied  Mathematics 
section  of  the  Science  and  Mathematics  Research  Group.  The  major  part 
of  the  work  was  done  under  Foundational  Research  funds. 

The  problems  that  are  resolved  in  this  paper  were  first  brought 
to  the  attention  of  the  authors  by  Dr.  Klaus  Abt  in  connection  with 
his  work  on  armor  penetration. 

The  original  IBM  7030  (STRETCH)  code  in  FORTRAN  IV  was  developed 
by  Mr.  Travis  Herring.  Mr.  Robert  Belsky  and  Mr.  Douglas  C  don 
produced  the  machine  code  for  plotting  confidence  ellipses  as  output. 

Mr.  Alfred  Morris  supplied  the  coefficients  for  two  of  the  asymptotic 
expansions  used  in  the  original  code.  Cody's  algorithm  for  computing 
the  normal  probability  integral  was  analyzed  and  then  developed  as  a 
STRAP  code  for  STRETCH  by  Mr.  Gordon  Barker.  Subsequently  he  incorporated 
this  program  into  two  versions  of  the  main  computing  program. 

Dr.  Marlin  Thomas  supplied  one  of  the  more  interesting  examples  that 
are  cited  in  the  paper. 


Released  by: 


Ralph  A.  Niemann,  Head 
Warfare  Analysis  Department 
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ABSTRACT 


Necessary  and  sufficient  conditions  are  obtained  for  the  existence 
of  the  maximum  likelihood  estimates  (MLE)  of  the  parameters  of  a  normal 
distribution  for  quantal  responses.  It  is  shown  that  whenever  the  MLE 
estimates  exist  they  are  unique.  A  modified  Newton -Raphson  procedure 
is  given  which  will  converge  globally  to  the  MLE  estimates.  These 
results  are  new  and  directly  applicable  to  an  armor  plate  penetration 
problem  or  any  other  types  of  experiments  based  on  quantal  responses 
which  fall  under  a  normal  distribution. 

A  computer  program  is  described  which  includes  as  output  a  set  of 
plotted  confidence  ellipses  centered  about  the  MLE.  Various  examples 
and  the  corresponding  computer  outputs  are  given. 

Probit  analysis  and  confidence  regions  for  small  samples  are 
discussed  in  separate  appendices . 
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I.  INTRODUCTION 


This  report  gives  a  mathematical  analysis  o£  maximum  likelihood 
theory  as  it  is  used  to  find  the  "best"  estimates  [7,  Chap.  32], 

H  ,  'a  of  the  mean,  fiQ,  and  the  standard  deviation,  aQ ,  of  a  normal 
distribution  which  governs  the  variations  in  measured  sensitivity  data. 
By  sensitivity  data,  we  mean  a  collection  of  measurements  or  determina¬ 
tions  from  experiments  for  which  a  stimulus  is  usually  only  applied 
once  to  each  item,  and  for  which  the  response  in  every  case  is  quantal, 
i.e.,  can  be  described  as  a  success  or  failure  by  some  arbitrary 
criterion,  [?],  [l4] .  Experiments  for  dosage  mortality  studies,  £l3], 
and  for  armor  plate  evaluations  are  of  this  nature.  Statisticians 
categorize  the  treatment  of  such  problems  under  the  general  term-- 
sensitivity  analysis.  The  main  results  given  in  this  paper  are  suffic¬ 
iently  general  to  include  situations  in  which  levels  of  stimulus  cannot 
be  precisely  assigned  in  advance.  The  basic  paper  dealing  with  this 
particular  phase  was  written  in  1956  by  Golub  and  Grubbs,  1>J. 

The  first  objective,  following  M,  is  to  set  up  the  sensitivity 
problem  in  mathematical  terms .  The  theory  of  maximizing  a  likelihood 
function  as  popularized  and  extensively  developed  by  Fisher,  [Uj, 
plays  a  fundamental  role.  Once  the  problem  has  been  defined  in 
mathematical  terms,  we  resolve  the  following  mathematical  questions 
which  arise,  but  which  have  been  open  heretofore: 
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(a)  Under  what  conditions  does  a  maximum  likelihood  solution 
exist? 

(b)  When  a  solution  exists,  is  it  unique? 

(c)  Given  that  a  solution  exists,  can  a  computational  procedure 

be  found  which  is  guaranteed  to  converge  globally  (independent 

of  starting  values  or  initial  estimates)  to  the  "best"  estimates, 

fi  ,  a  of  the  true  parameters  f/Q,  and  <tq? 

For  typographical  convenience  we  use  the  notation  fl  ,  O  for  maximum 

a  a 

likelihood  estimates  instead  of  tne  more  common  U  ,  a  . 

In  Section  II,  the  hypothesis  upon  which  maximum  likelihood 
estimates  are  based  is  discussed.  A  likelihood  function  is  derived. 

An  armor  plate  penetration  problem  is  described  and  is  used  to 
facilitate  the  derivation.  In  Section  III  questions  (a),  (b)  ,  (c) , 
given  above,  are  answered.  Question  (a)  is  answered  by  obtaining  a 
set  of  necessary  and  sufficient  conditions  for  the  existence  of  a 
maximum  likelihood  solution.  Question  (b)  is  answered  in  the  affirma¬ 
tive.  Question  (c)  is  also  answered  in  the  affirmative;  a  modified 
Newton-Raphson  procedure  is  proved  to  converge  globally  to  the  maximum 
likelihood  estimates.  The  main  results  are  given  by  theorems  3, 

4,  and  5. 

In  Section  IV,  for  completeness,  a  derivation  is  given  of  the 
expressions  for  the  elements  of  an  associated  covariance  matrix  and 
of  the  corresponding  confidence  ellipses.  Section  V  describes  the 
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computer  program  as  it  is  actually  used.  Several  examples  are  given. 


A  discussion  and  a  comparison  of  our  results  with  those 
obtained  from  Finney's  Probit  Analysis ,  [lO] ,  are  given  in  Appendix  C. 
Appendix  D  is  concerned  with  some  remarks  on  small  sample  theory. 

II.  CONSTRUCTION  OF  THE  LIKELIHOOD  FUNCTION 

The  idea  of  a  likelihood  function  is  founded  on  the  premise  that 
if  one  can  specify  a  mathematical  function,  F(c^,  c2,  which 

depends  on  the  parameters  c^,  to  represent  the  probability  of  occurrence 
of  a  set  of  events  {j^}  «  then  the  "best"  estimates,  in  a  statistical 
sense,  for  the  c^,  are  those  values  c^  for  which  F(c^,  C2,  . cTj)  is 
an  absolute  maximum.  More  precisely:  Given  the  occurrence  of  a  set  of 
events  { J^}  ,  the  likelihood  function  is  the  mathematical  function, 
F(c^,  ....  Cj) ,  which  represents  the  probability  of  the  occurrence  of 
the  events  {j^}  where  the  Cfc(k  *  1,2,  ...,  j)  are  parameters  whose 
true  values  are  unknown.  The  "best"  estimates  in  a  statistical  sense 
for  the  parameters  are  those  values  c^  which  make  F  an  absolute  maximum. 
Heuristically,  the  reason  for  estimating  the  c^  thusly,  is  that  since 
the  {Ji}  have  occurred,  we  should  estimate  the  so  that  the  proba¬ 
bility  of  their  occurrence  is  as  large  as  possible.  This  result  is 
obtained  by  making  F  an  absolute  maximum.  An  elementary  example  is 
discussed  in  detail  in  £l7;  page  152] « 

Ah  important  problem  in  armor  plate  penetration,  which  gave  rise 
to  the  studies  reported  in  this  paper,  will  be  used  to  derive  the 
likelihood  function  on  which  the  analysis  is  based. 


In  armor  plate  testing,  the  investigator  wants  to  know  the  minimum 
speed  required  by  a  projectile  of  given  type  to  penetrate  a  steel  plate 
of  given  size  and  composition.  By  minimum  or  critical  penetration 
speed  with  respect  to  a  given  plate,  we  mean  that  speed  below  which 
no  penetration  of  the  plate  would  occur  and  above  which  penetration 
for  that  particular  plate  would  always  take  place;  This  critical 
penetration  speed  of  the  projectile  however  will  vary  among  a  set  of 
presumably  identical  plates,  primarily  because  of  random  variations 
in  steel  composition,  flaws,  etc.  from  plate  to  plate.  Hence,  the 
experimenter  settles  for  an  average  critical  speed  of  penetration, 

//Q,  where  the  word  average  is  used  in  the  usual  statistical  sense. 

The  real  number  fiQ  can  be  estimated  from  a  knowledge  of  the  likelihood 
function  which  is  associated  with  the  problem. 

Hence,  assume  there  exists  an  infinite  population  of  steel  plates 
identical  in  form  and  manufactured  from  the  same  process.  Each  plate 
is  characterized  by  its  critical  speed  of  penetration  so  that  a  mean 
or  average  critical  speed,  fl Q,  can  be  associated  with  the  population. 

The  assumption  is  made  that  the  critical  speed  for  each  plate  is  normally 
distributed  about  the  mean  flQ  with  a  standard  deviation,  aQ.  The 
techniques  that  are  used  to  determine  the  validity  of  the  normality 
assumption,  and  the  transformations  of  variables  that  can  often  be 
used  to  approximate  normality,  £8],  are  outside  the  scope  of  the 
present  study. 
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Consider  the  following  experiment.  Five  plates  are  drawn  from  the 
infinite  population,  numbered  one  through  five,  and  tested  by  firing  an 
identical  projectile  once  at  each  plate.  Due  to  variations  in  gun 
powder,  gun  barrel  eccentricities,  etc.,  the  desired  impact  speed  for 
each  projectile  is  generally  not  precisely  realized,  i.e.,  the  stimuli 
are  not  completely  under  control,  [l4],  Suppose  the  results  of  the 
tests  show  that  plate  # 1  was  penetrated  by  the  projectile  traveling 
at  a  speed  a^  (from  which  we  conclude  the  critical  speed  of  plate  #1 
is  not  larger  than  ai) .  This  test  is  called  a  success.  Similarly, 
suppose  "successes"  were  recorded  for  plates  #2  and  #5  at  projectile 
speeds  &2  and  a3*  respectively.  On  the  other  hand,  suppose  "failures" 
were  noted  for  plates  #3  and  #4  at  speeds  b^  and  b2,  respectively 
(these  plates  were  not  penetrated  so  that  e.g.,  plate  #3  must  have  a 
critical  speed  larger  than  b-^) .  Since  the  plates  are  assumed  to  be 
normally  distributed  with  respect  to  their  critical  speeds ,  it  is 
easy  to  express  the  probability,  p^,  of  drawing  a  plate  with  a  critical 
speed  no  iarger  than  a^,  namely 

<al  -  V  lao 

Pi  “  P  [<*l  -  J 

-  OO 

By  similar  reasoning,  the  probabilities  p2  and  p2  of  drawing  plates 
#2  and  #5,  respectively  are 


i 


The  probability  q^  of  drawing  plate  #3,  for  which  a  failure  was  recorded 
at  a  speed  b^,  is  given  by 


exp(-t  /2)dt. 


(b,  -  M)/ti r 
i.  o  o 


Thus  denotes  the  probability  of  drawing  a  plate  with  a  critical 
speed  no  smaller  than  b^.  The  probability  of  drawing  plate  #4  is 
then  qf(b2  -  1  *  q„.  Assuming  all  of  the  five  cases  are 

independent  of  one  another ,  it  follows  directly  from  elementary 
concepts  that  the  probability  FQ  of  observing  the  events  as  they 
occurred  is  given  by  the  product 

Fo  “  PlP2p3qlq2  ‘ 

In  general,  if  one  tests  n  +  m  plates  and  records  n  successes  at 
penetration  speeds  a^,  a£,  . ..,  an,  and  m  failures  at  penetration 
speeds  b^,  b2>  . ..,  b^,  then  the  probability  that  this  sequence  of 
events  occurs,  in  the  order  observed,  is  given  by 


-  17  pt  77  q<  • 

i-l  j-i  J 


The  likelihood  function  is  simply  obtained  from  (1)  by  replacing  the 
unknown  quantities  //Q  and  by  free  parameters ,  or  variables ,  /i 
and  <r,  respectively.  In  order  to  keep  notation  to  a  minimum  and 
since  the  expression  given  by  (1)  will  not  appear  again,  we  use  the 
same  notation  to  expreaB  the  likelihood  function,  F,  as 
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(2) 


n  m 

F  =  77  P<  77  q,, 

i=l  j=l  J 

where ,  her eaf ter , 

Pi  =  P  [(ai  -  M)/<r]  ,  (3) 

==  q[(bj  -  v)/cr].  (4) 

According  to  maximum  likelihood  theory,  "best"  estimates,  fi  ,  O 
of  the  parameters  flQ  and  oQ  are  obtained  by  maximizing  F,  in  (2) , 
as  a  function  of  the  variables  [i  and  <7.  Thus,  the  statistical 
problem  has  been  reduced  to  the  mathematical  one  of  maximizing  a 
function,  F,  of  two  independent  variables  fl  and  tr,  which  has 
differentials  of  all  orders.  The  mathematical  problem  raises  the 
questions  posed  in  the  Introduction. 

It  is  worth  noting  that  the  estimates  ^  and  O’,  obtained  for 
and  <tq  will  depend  on  the  input  sequences  {a^J  and  {bj}. 
Hence  some  measure  of  reliability  in  the  estimates  is  needed.  Such 
measures  can  be  obtained  by  considering  so-called  confidence  regions 
which  turn  out  to  be  ellipses  in  the  fl  o -plane.  This  phase  of  the 
study  will  be  covered  in  some  detail  .  Section  IV. 

III.  ANALYSIS  OF  THE  LIKELIHOOD  FUNCTION 

Let 

n  m 

~Li.fl,  a)  =  log  F(J£,<7)  =  £  log  p.  +  E  log  q,.  (5) 

6  i=l  1  j-l  J 


I 

l 

l 

i r 


f 


We  treat  L  rather  than  F  hereafter,  since  less  notation  is  needed. 
Clearly  F  is  a  maximum  at  a  point  if  and  only  if  L  is  a  maximum  at  the 
same  point.  The  variables  fx  and  a  &.'c  replaced  by  new  variables 
dt  and  /# ,  through  the  1-1  transformations: 


a  =  m<y  ,  (//  =  ct/0), 

£  =  1  /a  >  0,  (a  =  1  /£). 


In  addition,  let 


S.L  *  ai£  “  a  > 

~,-b3P  -a, 

so  that  in  terms  of  the  new  variables,  a  ,  0  ,  (3)  and  (4)  become 


(6) 

(7) 

(8) 
(9) 


/'si 

Pi  =  P(Si)  -  J  z  (t)dt, 

-oo 


qj  =  =  I  z^dt> 


(10) 


(11) 


where 


2.  (t)  =  — 1  -  exp(-t2/2)  . 

|/  2  7T 


(12) 


We  will  also  have  need  for  the  following  partial  derivatives  of  L: 


_8L 

da 


m 


L  Cy i /q . ) 
1=1  J  J 


L  (xi/Pi), 


(13) 


-  ff  jaaaaiiiaia 


Lfl  ■  Tfl  "  £  "  £  b4  Cy  1/q1> »  (14) 

p  &P  i-1  j=l  J  J  J 

-  -  2^  (yj/qj)[(yj/qj)  -  t j  ]  -  (xi/pi)  Jcxi/Pi)  +  ,  (is) 

L«/?  “  '  fcj] 

+  X  «i(xi/p1)J^(xi/pi)  +  stJ  ,  (16; 

LW  “  J/j •  fcj] 

-  X  «i  (^/Pp  [^/Pi.)  +  8i]  »  (17 


where 


xt  ~  z(8t) , 


Yj  =  z(tj) .  (1- 

The  and  bj  are  arbitrary  real  numbers .  o  and  /?  by  their  nature 


are  positive. 


The  1 erara  which  follows  is  stated  for  easy  reference.  The  proof 
la  not  given,  it  follows  from  a  well-known  theorem  in  analysis, 

[2,  p.  1493,  and  elementary  considerations. 
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i 

l 

LEMMA  1.  Let  f(g,ff)  have  continuous  second  partial  derivatives  in 
an  open  region  T  of  the  q/?-plane  and  let  there  exist  a  point 
(ct,/3)€  T  such  that  at  (a, 8) 

faa  <  °>  faa  £  pp  "  £ap  >  °> 

then  f(ot  ,jg)  has  a  maximum  at  id  ,8)  if  and  only  if 

fa  (5,/?)  =  £p  (a,/?)  =  0. 

Moreover,  if  £  has  a  maximum  at  (&,&),  then  the  equalities  must 
hold  resardless  of  the  above  inequalities . 

The  possibility  that  all  the  a^  are  equal  (say  to  a) ,  and  that 
all  the  bj  are  equal  (say  to  b)  is  ruled  out  by  Lemmas  2  and  3. 

LEMMA  2.  If  L  has  a  maximum  and  a^  =  a,  bj  =  b  for  each  i  and  j,  then 

a  =  b. 

Proof:  If  L  has  a  maximum  at  (ot  ,/3),  then  by  Lemma  1,  L^  =  Lp  =  0 
at  this  point.  By  equating  the  right  hand  sides  of  (13)  and  (14)  to 
zero,  for  the  arguments 

Si  =  a p  -  a  ,  t.  =  bp  -  a  , 
one  obtains  n^/p^  =  m(yj/qj), 


na(Xi/pi)  =  mh(yj/qj)  . 

Q  .E  ,D . 

LEMMA  3.  If  for  each  i  and  j,  a£  =  bj  =  c,  then  L  assumes 

its  maximum 

at  every  point  of  the  straight  line 

a  =  cp  -  s* , 

(20) 

where  s  is  determined  by 

p(s*)  =  n/ (m  +  n) . 

(21) 
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Proof:  Since  c  is  a  fixed  constant, the  function  L  can  be  considered 
in  this  case  as  a  function  of  the  single  variable  s  =  c/J  -  ct ,  so  that 

L(s)  =  n  log  p(s)  +  m  log  q(s)  , 


and 

L'(s)  =  (nx/p)  -  (my/q)  =  x  [(n/p)  -  (m/q)]  . 

Since  L"  <  0  when  L'(s)  =  0,  it  follows  that  L  attains  a  maximum  at 
those  values  of  s  which  satisfy 

p(s)  »  n/(m  +  n)  <  1. 

But  p(s)  is  a  positive  monotone  continuously  increasing  function  of  s 
and  less  than  one.  Therefore  there  exists  a  unique  value,  s  =  s*,  for 
which  (21)  holds  and  L  takes  a  maximum  for  all  (ct  , fi)  which  satisfy 
(20). 


Lemma  3  implies  the  obvious  conclusion  that  best  estimates  cannot 
be  determined  if  the  stimulus  is  maintained  at  the  same  level  for  all 
experiments.  It  is  also  necessary  that  both  the  {a^}  and  {bj}  sets 
bq  non-empty.  If  say  {bj  }  were  empty,  then  from  (13),  La  would  always 
be  negative  and  L  could  not  have  a  maximum.  Hence,  it  is  assumed 
hereafter  that  neither  of  the  sets  {*$_}»  {bj}  are  empty  and  that  at 
least  one  of  the  sets  must  contain  at  least  two  elements  which  differ. 

The  next  lemma  will  be  used  to  prove  Theorem  1. 

LEMMA  4.  If  t  € (-o© ,oo) ,  then 

[z (t) /q(t)]  -  t  >  0,  (22) 

[z(t)/p(t)]  +  t  >  0.  (23) 
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i:  i 

b-  ! 


Proof:  Consider 


Then,  since 


we  have  that 


f(t)  =  z  -  tq. 


z(±oo)  -  0,  q(oo)  -  0,  qC-oo)  =  1, 


lio  f(c)  =  lim  (-t>  =  +oo 
t  -oo  t  ■>  -oo 


*  lim  ! -tqCfc>  J  ^  lim  -j —  f  uexp(-u2/2)du 

t-+oo  t->oo  *■-* i Przr 


t->oo  l/in  t 


lim  exp(-t2/2)  =0. 

t->oo  V  2  7T 


In  addition. 


f'(t)  =  -tz  -  q  +  tz  =  - 


q  <  0. 


Therefore,  it  follows  that  f(t)  >  0,  and  since  q(t)  >0 

W.  _  z(t) 

q(t)  “  q(t)  "  t  >  °*  Vt€  C-oo,oo). 

The  proof  for  (23)  follows  by  replacing  t  by  (-t)  in  (22)  and  using 
the  facts  that  z(t)  =  z  (-t) ,  q(-t)  ^  p(t). 

THEOREM  1.  If  ,(«,#)  is  any  point  in  the  qjg-plane.  thsn 

L0tCt  ^  °»  L/30  <  °-  (24 


■V,  i*4 vi  ix.iteiC'4 eVi 


Proof;  The  expressions  for  Laa  end  L^^  are  given  by  (15)  and 
(17).  The  quantities  (yj/qj)  and  (*j_/pj)  are  positive  and  from 
Lemma  4, 

(yj/qj)  -  tj  >o» 

(x^/p^)  +  8^  ^  0.  Q.E.D. 

Where  no  ambiguity  will  occur,  it  will  be  understood  that  sums  over  i 
run  from  one  to  n  and  sums  over  j  run  from  one  to  m. 

Another  useful  property  of  L  is  given  by  the  following  theorem: 
THEOREM  2 .  The  discriminant  of  L, 

A  S  L^  ,  (25) 

is  positive  for  all  (ot ,  3) . 

Proof :  Let 

Ui  S  C*i/Pi>[C*i/Pi>  +  SjJ  (>0)  (26) 

Vj  =  (yj/q<)  [(yj/qj)  -  t j  ]  (>0).  (27) 

Then  A  can  be  expressed  in  terms  of  the  right  hand  sides  of  (15) , 
(16),  (17)  accordingly: 

A  -  L&l Ut  +  Xb/VjTVj  +  Xa^UjXVj 

+  LbiZWi£V±  -  (Za^)2  -  (-TbjVj)2  -  2Z)aiUii:b;JVj. 

The  Cauchy-Schwarz  inequality  requires  that 

(Ta^)2  ^  L**vtL iTt>  (28) 
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(Xb^)2  ^  x'b.2v.rvr 


(29) 


Equality  holds  for  (28)  if  and  only  if 

at2  VvZ  *  K  VU^  , 

where  K  is  any  real  positive  number.  Clearly,  this  relation  implies 
all  a^  are  equal,  since  lh  >  0  by  (23).  Similarly,  equality  in  (29) 
requires  that  all  bj  be  equal.  But,  by  the  requirement  that  at  least 
two  of  the  a^  or  bj  must  differ,  at  least  one  of  the  inequalities  (28), 
(29)  must  be  strict.  Hence,  using  (28)  and  (29)  twice, 

A  >  XaJ'UiXVj  +  Xb^VjXUi  -  2£aiUi£bjVj  (30) 

>  o:a1ui)2(2:v:i/2:ui)  +  a:bjvj)2a:ui/z:vj)  -  2£aiui£bjvj 

-  [(XVj/ZUi)172^^  -  iLvi/EV.)1/Z£b .V.]2  ^  o. 

We  are  now  able  to  use  a  well  known  theorem  of  analysis  to  obtain 
an  answer  to  question  (b)  of  the  Introduction.  The  proof  is  given  for 
completeness . 

THEOREM  3 .  There  exists  at  most  one  point  (ot  ,  3)  at  which  L  assumes 
a  maximum. 

Proof:  Assume  L  has  a  local  maximum  at  two  different  points  (ot^,  /S^) 
and  («2>  ^2^  •  The  Taylor  formula  for  a  function  of  two  independent 
variables  gives 

L(ce2,  £2)  =  L(av  /?x)  +  •  An  +  2ir?T  •  d2L  -  An  , 

(31) 
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where  Vl>  represents  the  gradient  of  L  with  components  and  ; 
D^L  is  the  matrix 


(32) 


and  At)  is  the  column  vector  with  first  and  second  components 
Act  «  (ot2  -  cep  and  A0  =  (02  -  0  P  ,  respectively.  The 
superscript  T  on  Atj  indicates  the  transpose.  The  elements  of  D^L 
in  (31)  are  evaluated  at  a  point  | ,  where  for  some  c (0  <  e  <  1) , 


£  -  V  +  €  AT]  ,  T]  -  (ce1,  /Sp. 

By  Theorems  1  and  2, 

/■ 

Lace  ^  ^00  ^ 

A  >  o, 


(33) 


hence  D^L  is  negative  definite  for  all  (ct  ,/S).  But  since  L  has  a 
maximum  at  (ce^,  0^)  this  implies  by  Lemma  1  that  La  **  Lyg  =  0  at 
<  Ct  i ,  >3  x)  .  Therefore  (31)  reduces  to 


L(a2,/32)  -  L(ctl,^1)  +  (4n)T  •  D2 L  •  (4n)  <  L(«1>  £i>- 


Interchanging  the  roles  of  (ct2,  /?p  and  (a^,^p  and  applying  the 
same  arguments  leads  to 

I*(<*i»^j)  <  L(a2 , 02)  • 

This  inconsistency  can  be  removed  only  if  Atf  *  0,  which  implies 
(cti,/Si)  and  (Ct2»>?2^  coincide.  Q.E.D. 
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It  will  be  assumed  hereafter  that  the  and  bj  are  separately 
ordered  in  increasing  magnitude,  i.e., 

*1  ^  a2  ^  '*•  ^  ®n’ 

(3h/ 

bi^  \>2  ••• 

Question  (a)  as  stated  in  the  Introduction,  is  a  natural  one  to 
raise  now,  i.e.,  under  what  conditions  on  the  a^_  and  bj  does  L  have  a 
maximum?  The  answer  is  supplied  by  Theorem  4.  The  necessity  of  (35) 
is  well-known,  [l4]  ,  [l8]  . 

THEOREM  4.  The  function  L  has  a  unique  maximum  for  /3  >  0  if  and  onl 
if  the  quantities  a-;  and  b^  satisfy  the  following  inequalities: 


“1  <  V  (3 


iXb.  <  t  £  a • . 


Proof:  If  L  has  a  maximum  at  (a  ,/5)  ,  then  from  (13) ,  (14) ,  and 

Lemma  1,  we  have 

La(«,/S)  -  Zfri/Pi)  -  D(yj/qj)  =0,  (3 

L^(0t,yS)  =  Xai(xi/pi)  -  £bj(yj/qj)  =0,  (3 

where  all  quantities  are  evaluated  at  (a,yS).  From  (38),  and  by 
the  orderings  specified  in  (34)  ,  it  follows 


bj^yj/qj)  <  ani:(xi/p1) 
a1£(xi/pi)  <  bm£(yj /qj)  , 


16 


Equality  is  not  possible  in  either  (39)  or  (40)  ,  because  at  least  two 
of  the  a^  or  two  of  the  bj  must  differ.  Hence,  (35)  follows  directly 
from  (37),  (39)  and  (40). 


The  proof  for  (36)  is  obtained  by  using  a  classical  inequality 
which  is  stated  in  the  form  of  a  lemma. 

LEMMA  5.  If  two  finite  sequences  of  N  real  numbers,  {uk}  end  {vfc}> 


are  given  with  the  properties  that 

U1  <  u2  ^  •**  <  UN>  V1  ^  v2  <  •••  <  VN> 
then  the  following  inequality  holds 


N  N  N 

£  uk  £  vk  ^  N  £  ukvk.  (41) 


Moreover,  if  the  inequalities  on  the  elements  of  one  of  the  sequences 
are  reversed,  then  the  inequality  sign  in  (41)  is  reversed.  Equality 
for  (41)  holds  if  and  only  if  all  the  elements  of  at  least  one  of  the 
sequences  are  equal. 

Proof:  The  proof  follows  immediately  from  the  identity 


N  N 

X  Uk  X  Vk 


N 


N 

L 

l 


ukvk 


N-l 

£ 

k=l 


N 

L  (u^-  UjKvi  -  vk).  Q.E.D.  (42) 


In  order  to  apply  this  lemma,  we  observe  that  z(t)/q(t)  and 
z(t)/.p(t)  are  monotonically  increasing  and  decreasing  functions  of  t, 
respectively.  This  statement  is  easily  verified  by  differentiating 
these  above  quantities,  using  Lemma  4  and  the  limiting  properties  of 
the  functions  as  t  *►  ±  cc  . 
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We  apply  Lenina  5  twice.  In  one  case,  let 

uj  ■  bj- 
vj  • 

Then,  for  the  point  (a,/3),  the  argument  of  (yj/qj) >  tj  *  bj0  -5, 
is  an  increasing  function  of  b j ,  so  (yj/qj)  is  an  increasing  function 
of  bj.  Hence  by  (34)  and  (41) 

£  b j  X  (y j  /q j  ^  »£bj(yj/q,).  (43) 

In  the  other  case,  let 

ui  *  H> 
vt  -  xt/Pi. 

Then,  for  the  fixed  point  (<X,>9),  the  argument  of  (x^/Pi'^Si  *  a^3  “  a  » 
is  an  increasing  function  of  a^ ,  hence  (xj/pj)  is  a  decreasing  function 
of  a^.  Thus,  by  (34)  and  (41) 

nJCa^xj/pj)  ^  £ai<£(xi/pi) .  (44) 

We  again  use  the  fact  that  either  some  of  the  a.  *  «ome  of  the  bj 
must  be  different,  so  that  by  Lemma  5  either  (43)  or  (44)  must  be  a 
strict  inequality.  For  definiteness,  suppose  (44)  is  strict,  then 
from  (37),  (43),  (38),  and  (44) 

I  £  bj  X (Xj/p,)  -  |  £  bj  X <yj/qj)  ^  X bj  (yj/qj) 

•  Xai(xi/pi)  <  I  X  X  Ut/pi) .  (45) 

•a 

Since  £(xfi/p±)  >  0,  the  inequality  (36)  follows  from  (45). 
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It  remains  to  show  that  (35)  and  (36)  are  sufficient  to  insure 
L  has  a  maximum.  For  the  proof  we  shall  use  Lemmas  6  and  7 . 

LEMMA  6.  The  function  L(ct  ,  0)  has  a  unique  maximum  on  (-00,00) . 
Moreover,  this  maximum  occurs  at  a  =  a*,  where 


p  <-«*)  =  .2 L  ,  q(-a*)  =  P(a*)  -  -g-  .  (46) 

nj+n  nrrn 

Proof:  Equation  (5)  is  expressed  in  terms  of  Ot  and  yff  by  using  (6) 
and  (7)  .  Then  f}  is  set  to  zero  to  obtain  the  expression  for  L(a  ,  0)  , 


L(ct ,  0)  «  n  log  [p(-oe>]  +  m  log[q(-cO]  . 


Now,  differentiating  1(0  ,  0) , 


~  (Ct  ,  0)  *  -  n[z  (-a)/p(-Ct )]  +  m[z  (-«  )/q(-a)]  , 
da 

and  setting  the  result  to  zero  implies  there  exist  values  of  ct  , 


say  a*,  for  which 


However 


[m/q(-a*)]  =  [n/p (-«*)]  • 


q(-cO  =  1  -  p(-Ct)  , 

so  that  (46)  follows,  and  since  p(C O  is  a  monotonically  increasing 
function  of  its  argument,  Ot*  is  unique.  By  Theorem  1,  L^^  is 
always  negative,  hence  L(Ct ,  0)  takes  its  maximum  at  (Ot*,  0). 


V. a<sti  — • ■ 


L0<ct,  0)  -  r*i[z(-flO/pC-a>]  -£bj[z(  -a)/q(-a)] . 
Evaluating  this  quantity  at  Ot  -  of*,  and  using  (46)  and  (36) 

gives 

l0(a*3  0)  -  (m+n)z(of*)  ^  XbjJ  >  0.  Q.E.D.  (47) 

The  basic  idea  that  is  used  to  complete  the  sufficiency  proof  can 
be  briefly  described  as  follows:  A  triangular  domain  T  (see  Figure  1) 
is  constructed  in  the  -plane,  with  the  point  (ct*,  0)  in  the  base 
of  the  triangle  but  not  one  of  its  vertices,  such  that 

L (of.yS)  ^  L(a*,  0)  for  all  <a,/3)  e  dT.  (48) 

Then,  by  Lemmas  6  and  7,  there  exists  a  point  (of*,  f}') ,  an  interior 
point  of  the  domain  T,  with  fl'  >  0,  for  which 

L(of  *,  £')  >  L(0f  * ,  0)  . 

It  will  then  be  easy  to  show  that  the  global  maximum  (Of  ,  yS)  of  the 
function  L  (not  necessarily,  the  same  point  as  (of*,  £'))  must  lie  in 
the  interior  of  T  with  yS  >  0  as  required. 

The  proof  of  the  existence  of  the  triangular  domain  T  follows 
(see  Figure  1) .  Consider  lines  with  equations  of  the  form 

alP  ~  a  “  “tl>  >  0  (49) 
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Since,  by  (5),  L(a  ,/?)  -  X  log  p^  +  X  log  and,  since  all  these 
summands,  logarithms  cf  numbers  on  the  Interval  (0,  1),  are  always 
negative,  we  have,  for  every  point  (a  ,  0)  on  this  line, 

!*(<»* /?)  <  log  Pi  “  log  p(aj0  -  «)  -  log  pC-^).  (50) 

But,  by  the  properties  of  the  probability  integral 

p(x)  =  (1/  Y2  7T  )  f  exp(-t2/2)dt,  p(-tj)  0 
-oo 

and  hence  log  p(- t^)  •>  -oo  as  t^->  «  .  Therefore  we  can  certainly 
give  t^  a  sufficiently  large  positive  value  so  that  L(ot,/?)  <  L(a*,  0) 
for  every  point  of  the  line  (49).  Also,  since  the  Ot -intercept  of  this 
line  is  a  *  t^,  we  can  at  the  same  time  choose  t^  in  such  a  way 
(t^  >  or*)  that  the  a  -intercept  of  the  line  lies  to  the  right  of  the 
point  («*,  0)  .  We  suppose  that  a  fixed  t^  satisfying  these  conditions 
is  chosen,  and  this  line  is  designated  as  line  @  in  Figure  1. 

Similarly  we  can  show  the  existence  of  a  line  (2)  ,  bmj3-0f  *  +t£ 


Figure  1 


.<«•*>  T 

(a*.  0')  * 


(a*,  0) 


with  t2  >  0,  with  its  a -intercept  lying  to  the  left  of  the  point 
(d*,  0),  and  such  that,  for  all  points  of  line  (2)  ,  L(ot,^)  <  L(ot  *,  0) 
It  is  easily  shown  by  analytic  geometry  that  these  lines  a^/3  -  a  =  - 1 ^ 
and  bm^-Ct  =  +t2  must  intersect  in  the  upper  half-plane.  The  solution 
of  these  equations  is  d  =  (t]bm  +  t2a1) /O^-a^)  ,  fi  =  (tj+t^/O^-aj)  . 
Since  we  hare  have  >  0,  t2  >  0,  bffi  >  a^,  we  must  have  ft  >  0 
at  the  point  of  intersection. 

Thus  we  have  a  tri ingle  (Figure  1)  formed  by  segments  of  the 
ot -axis  and  lines  (I)  and  (g)  .  This  compact  domain  (triangle  plus  its 
interior)  is  denoted  as  T,  and  the  interior  lies  in  the  upper  half-plane, 

P  >  0. 

At  all  points  (a  , ft)  of  the  triangle  itself,  the  boundary  of  T,  0T, 


we  have 


L(ct,£)  ^  L(«*.  0) 


since  L(a*,  0)  is  the  maximum  value  for  the  entire  a -axis,  end  since 
lines  0  and  (2)  were  chosen  in  such  a  way  that  L(Ci  ,/3)  <  L(ot*,  0) 
at  all  points  of  these  lines . 

But,  by  Lemmas  6  and  7,  there  must  exist  a  point  (a*,  /?'),  an 
interior  point  of  the  domain  T  with  /3 '  >  0,  such  that 

L(ot*,  P')  >  L(ot*.  0).  (52) 

Combining  this  with  (51),  it  is  seen  that  L(a*,  ft')  is  greater  than 
L(ct  ,  p)  for  all  points  (a  ,  ft)  of  the  boundary  of  T. 


......  j 


However,  the  function  L  must  assume  a  m&ximum  for  the  compact  set  T 
at  some  point  of  T,  by  elementary  real  analysis.  This  maximum  cannot 
be  assumed  on  the  boundary  of  T,  by  what  has  been  shown.  Hence  it 
must  be  assumed  at  some  interior  point,  (a,/?),  with  /3  >  0,  not 
necessarily  the  same  point  as  (<y*,  $’)• 

But,  by  Theorem  3,  there  can  exist  at  most  one  maximum  point  for  L, 
including  local  maxima,  in  the  entire  piane.  Hence  this  maximum  point, 

(a  )  ,  for  the  compact  domain  T,  with  >  0,  must  be  the  unique 
global  maximum.  This  completes  the  proof  of  Theorem  4. 

Our  objective  now  is  to  describe  a  practical  computational  procedure 
by  which  (ot  ,/?)  can  be  determined  to  any  specified  accuracy.  Reference 
will  be  made  to  the  well-known  Newton-Raphson  procedure  (abbreviated  N-R) 
for  two  independent  variables,  [l5,  p.  45 l] .  It  will  be  shown  that  a 
modified  form  of  e  N-R  algorithm  will  always  converge  globally  to 
(.a  ,fi)  ,  i.e.,  regardless  of  what  starting  point,  (a*,/?*),  is  chosen. 

We  remark  that  a*  here  is  a  convenient  notation  and  has  no  relation 
to  the  same  symbol  in  (47) . 

A  point  (a*,/?*)  is  chosen  initially  (assume  it  is  not  (d,/?)). 

The  N-R  algorithm  is  applied,  with  the  objective  of  reducing  and 
to  zero  simultaneously,  to  yield  increments  Aa  and  Afi  and  a  new 
point  (ct  2  ,  &2)  •  ^e  caH  4ct  ana  Aft  ,  N-R  increments.  They  make  up 
the  first  and  secoud  components,  respectively,  of  the  column  vector 
AT]  and  are  found  by  solving  the  linear  system 


(53) 

(54) 


A  01  Ljta  +  A0Lap  "  » 

AotLap  +  ApLpp  =  -  L  p  , 
where  all  partial  derivatives  are  evaluated  at  (a*,  ft*).  Hence 


(55) 


Aa  -  (i,g  Lap  -  La  p p  )  /  A  , 

A  p  =  (L  ^  L  a  p  ~  ^  eta')  /  A  , 

where  A  >  0  as  was  shown  in  Theorem  2.  In  vector  notation  (See  (31)) 


£  (u)  =  *7  +  u  An  ,  0  <  u  <  1 ,  (56) 

where  £fQ)  =  W  =  (Of  *,/?*)>  £  (1)  =  (°t2,i^2  sa^  ^-R 

algorithm  is  "modified"  if  the  continuous  variable  u  is  given  any 

J 

value  on  (0 ,  1)  . 

It  will  be  shown  there  always  exists  a  value  of  u,  say  uQ ,  such 
that  for  all  £  (u)  for  which  0  <  u  u0  $(  1, 


l  [  £  (u)  ]  =  L(a  ,0)  >  L(  a*,  ,3*)  =  l  (rj )  =  l  [£  (0)]  .  (57) 


The  quantities  A  a  and  A0  that  appear  below  are  assumed  to  be  N-R 
increments  and  are  to  remain  fixed  throughout  the  argument.  Then, 
using  the  notation  of  (31)  and  evaluating  derivatives  at  (a*,/?*), 


Vl  •  Arj  m  AciLa  +  AfiLp  >  0.  (58) 

The  inequality  follows  from  (55)  by  substituting  on  the  right  for 
A  a  and  A  0  so  that 


VL 


Arj  =  -  ± 


A  L 


l/3l  aa  -  2LaLpLap  +  LaL00 


(59) 


“  -  7  (gT  •  D2L  •  g)  >  0, 

A 

where  g  =  (L^  ,-La)  . 
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The  last  inequality  clearly  ho Ida,  because,  by  Theorems  1  and  2,  the 
matrix  D^L  is  negative  definite  and  A  xu  always  positive.  Therefore, 
using  the  mean  value  theorem  and  continuity  arguments ,  there  exists  a 
real  number  uQ  €  (0,1]  such  that  (57)  holds.  By  iterating  this  procedure, 
and  choosing  u  appropriately  at  each  step,  Jie  L  values  will  certainly 
converge  to  some  value  not  exceeding  L (fit,/?)  since  L  is  increased  at 
each  step. 

One  difficulty  in  actually  carrying  out  this  procedure  is  that  no 
definite  rule  has  been  given  for  choosing  u  at  each  step.  A  second,  and 
much  more  serious,  difficulty  is  the  possibility  that  the  iterates 
could  spiral  while  the  corresponding  values  of  L  converged  to  a  real 
number  bounded  by  L(Ot  , ft) .  It  will  be  proved  however  that  if  u  is 
properly  chosen  as  indicated  below  that  convergence  will  always  be  to 

(a,g). 

A  well-defined  procedure  is  now  given  for  choosing  u.  For  any 
given  step  take  u  =  2_r  where  r  =  k  or  k  +  1  (k  =  0,  1,  2,  ...)  as 
determined  by 

r  k  if  L  [  £ (2”k) ]  >  L  [  £ (2‘k_1) ] 
r  -  <  .  .  .  (60) 

l  k+1  if  L  [  £(2'k_1)]  >  L  [  £  (2”k)]  , 

where  k  is  the  smallest  nonnegative  integer  for  which 

L  [  £(2-k)]  >  L  [  £(0)]  .  (61) 

In  other  words,  the  full  N-R  step  (u  =*  1)  is  repeatedly  halved  until 

(61)  is  satisfied;  by  (58)  there  always  exists  such  a  k  and  it  may  be 

zero.  Then  one  more  halving  takes  place  so  that  r,  and  thus  u,  is  found 


from  (60).  This  procedure  can  obviously  be  carried  out  on  a  computer 
simply  and  efficiently.  The  next  theorem  summarizes  these  remarks. 
THEOREM  5 .  If  L(ct  ,fi)  is  a  maximum  and  (35)  and  (36)  are  satisfied 
then  the  modified  N-R  algorithm  as  described  in  the  preceding  paragraph 
will  always  converge  globally  to  (ot  , 8) . 

The  part  of  the  theorem  which  remains  to  be  proved  is  somewhat 
lengthy  and  is  given  in  Appendix  A. 

Actually  in  practice,  as  we  describe  in  Section  V,  the  regular 
N-R  algorithm  (u  -  1)  has  been  used  with  complete  success  in  spite  of 
inferences  to  the  contrary  in  the  literature  [14]  ,  []2l].  Many  cases 
of  all  types  were  tried  and  convergence  always  occurred  independent 
of  the  starting  values  of  a  and  /3 .  A  case  was  given  in  J]2l]  in  which 
divergence  was  reported;  our  program  converged.  (We  have  included 
this  case  as  one  of  our  examples  (No.  2A)  in  Section  V) .  We  are  thus 
led  to  the  conjecture  that  the  N-R  algorithm  always  converges  globally, 
provided  sufficient  accuracy  is  retained  in  all  computations,  but  a 
proof  has  not  been  found. 

We  remark  in  closing  this  section  that  on  implicit  assumption  has 
been  carried  throughout.  Namely,  if  L  has  a  maximum  at  (ot  ,  JS)  ,  then 
it  has  a  maximum  for  the  variables  (M  , ° ) ,  where 

£  -  <*/  M  ,  *  =  1/0  .  (62) 

This  assumption  is  clearly  valid,  because  the  variables  are  related  by 
1-1  transformations  (5) ,  (6) .  A  detailed  proof  has  been  carried  out 
but  is  not  included  in  this  report. 
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IV.  CONSTRUCTION  OF  THE  COVARIANCE  MATRIX  AND  CONFIDENCE  ELLIPSE 


We  continue  to  use  the  notation:  fiQ,  oQ  for  the  true  parameter*, 
and  H  ,  o  for  the  estimates  of  the  true  parameters  as  determined  turn 
maximum  likelihood  theory.  The  notation  used  below  follows  closely  that 
of  Golub  and  Grubbs,  [l4]  the  analysis  follows  that  given  by  Mood 
[17,  p.  212]. 

The  estimates  (/i ,  o)  carry  no  significance  without  some  measure 
of  the  possible  deviation  from  ( fiQt  cQ)  »  the  true  parameters  of  the 
distribution.  A  classical  procedure  for  obtaining  an  estimate  of  the 
error  for  large  sample  sizes  is  to  determine  a  confidence  ellipse 
(similar  to  a  confidence  interval  for  one  variable) .  It  will  have 
meaning  in  the  following  sense:  For  a  specified  level  of  confidence, 
say  95%,  an  ellipse  is  determined  in  the  file- plane  with  its  center  at 
(A*  ,  <0  such  that  with  probability  .95,  the  true  parameter  point 
(yUQ,  <7q)  is  contained  in  the  interior  of  the  ellipse. 

We  first  construct  a  so-called  covariance  matrix  and  then  show 
how  the  confidence  ellipse  is  obtained  from  the  elements  of  the  Inverse 
of  this  matrix.  We  will  resort  again  to  the  armor  plate  penetration 
problem  as  an  aid  in  elucidating  the  main  ideas  of  this  section. 

Consider  random  variables  (k  =  1,  2,  ...,  N)  where  N  ■  n  +  m 
is  the  number  of  shells  fired  (or  experiments  conducted) .  Each 
has  only  two  possible  values 

1  if  k  a  shot  was  a  success  (produced  penetration) 

0  if  kth  shot  was  a  failure  (no  penetration) . 
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(63) 


The  probability  density  function  for  the  random  variable  5  k  is 


where  p  is  the  normal  probability  integral,  (see  (3),  (4)).  The  quantity 
ck  is  the  stimulus  (velocity  of  the  projectile  in  our  example)  regardless 
of  whether  the  k  shot  is  a  success  or  failure.  The  quantities 
(mean  critical  speed),  a (standard  deviation)  are  parameters.  The 
probability  fk  (1 ;  jl ,  O )  that  the  k*-*1  shot  was  a  success  is  determined 
by  putting  6k  =  1  in  (64)  and  (65)  to  obtain  p  [(ck  ~  jl)  /  a]  ;  similarly 
for  a  failure  6k  is  set  to  zero  in  (66)  to  obtain  f  k  (0 ;  H  >  CF)  - 

q[(ck  -AO/<r]  • 

Two  new  quantities  are  introduced 

G(6k;/i,<r)  =  Gk  =  log  f(  6k;  n  ,  a)  ,  (66) 

H(6k;  a)  =  Hk  =  log  f(fik;  //,  o')  .  (67) 

Although  6^  is  a  discrete  variable  with  only  two  possible  values, 
each  fk  is  a  differentiable  function  of  the  parameters  /i  and  O, 
so  these  definitions  are  meaningful. 

For  the  sake  of  a  simpler  notation,  we  drop  the  subscript  k  but 
it  should  be  understood,  unless  otherwise  noted,  that  all  relations 
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that  follow  deal  only  with  the  k**1  experiment,  so  that  a  fixed  k  subscript 
should  be  understood  whenever  necessary. 

The  expected  value  of  G  is  given  by 

1  r  r 

E(G>  -  x  i  H,a)£(6  i  ,  (68) 

r-o 

where  ir  are  the  values  which  6  (note  subscript  k  has  been  dropped 
on  fi)  can  take,  i.e.,  6°  «  0,  6*  =  1.  Equation  (68)  is  the  discrete 
analog,  of  the  expression  which  would  be  used  if  we  had  a  continuous 
variable  ^  Instead  of  6 ,  namely 

oo 

G(ff  ;  /i,cr)f(0  ;  //,a)d0.  (69) 

CO 

From  (68)  there  follows: 

e(g)  -  £  *■  £  £(6r;M,o) 

r-o  oU  OM  r=o 

Starting  with  the  analog  of  (68)  for  H  and  dido  ,  instead  of  G  and 
dtdfi  >  it  is  shown  in  the  same  way  that 

E(H)  =  0  .  (71) 

The  fact  that  every  G  and  H  (i.e.,  for  each  k  =  1,  2,  . ..,  N) 
has  a  mean  or  expected  value  of  zero  simplifies  the  calculation  of  the 


E(G (^  ;  // » cr ))  jf 


From  (72) ,  using  (73)  ,  we  have 


-  2  log  f(6r;/i,a) 


Similarly 


cj-  _  „  9 


H  ’  ’ E  \Jd7i 108  f(5  •  0 

Nexr,  we  wish  to  obtain  a  useful  expression  for  the  covariance 
of  G  and  H.  By  definition 


7gh  *  X  G(d  iH,<r)H(6r;M,o)£(6r, 

r-o  ’  ' 


r-o  f(fir:  H  t  O') 


d£  (6r;//,a)  d£(6r;M,<r) 


If  (68)  is  differentiated  with  respect  to  a,  then 

0  "  £  G(5r;//,<7)f(5r; /i,<7) 

v  l_r*o 


„•  1 

f(* 


♦  1— i _ 

r-o  f(dr;n,a) 


j* 

8f(6  ;^,<r)  df(5r;^,<r) 


(77) 


From  (76) ,  using  (66)  and  (77) , 


log  f(Cr;/i,<T) 

°gh-'eL  dtido  J* 

It  is  also  obvious  from  the  definition  given  in  (76)  that 

°GH  “  °YG‘ 


(78) 


(79) 


We  introduce  the  vector  random  variable  (G,H)  for  a  given  k, 
k  =  1,  2,  ...,  N.  By  (70)  and  (71)  this  variable  has  mean  (0,0). 
We  denote  its  covariance  matrix  by  M,  which  is  defined  by 


Equations  (74) ,  (75) ,  (78)  give  expressions  for  its  elements  which 
will  be  used  later.  By  the  multivariate  central  limit  theorem, 

[7,  p.  316] 

l  CGk,  v 

k-l 

is,  for  large  N,  approximately  multinormally  distributed  with  mean 
(0,  . 0)  and  covariance  matrix,  A"^, 

A  X  =  I  M,.. 
k“l 

We  denote  this  matrix  specifically  as  an  inverse,  because  this  matrix, 
as  it  will  be  shown  later,  is  the  inverse  of  the  covariance  matrix 
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33 


34 


where  ,  (//£*  °2 )  816  on  c^e  open  *ine  of  the  straight 

line  connecting  (/i,<7)  and  (.UQ,  <rQ)  in  the  HO- plane. 

Since  (Al,zr)  is  the  point  for  which  L  is  a  maximum 

L  =  L0  (  H,a)  =  0, 

so  that 

1*h(Mq,  *o>  “  (MV  <ri>]  +  (°-  *0)[-l/z<r  (M L,  oj] 

Lcr  (M0*  *  (M  "  +  ^  ^2*  a2^  * 

(89) 

Now  (/Z^,  <Tj)  and  (/z2»  o^)  converge  with  probability  one  to  (UQ,  <rQ ) 
and  the  second  derivatives  on  the  right  hand  sides  of  (89)  converge 
with  probability  one  to  their  expected  values,  [l6](vol.  2,  2nd  Edition, 
page  55) .  Hence  (89)  may  be  regarded  as  a  set  of  linear  equations  in 
the  quantities  ( M~M0 )  and  (  O-  <ro) .  Using  vector  notation,  and 
superscript  T  for  transpose  (89)  becomes 

A-1.(M-  MQ,  o'-  a0)T  ■=  kl6  -  A  *  (LM,La)T,  (90) 

—  -  T 

where  0  -  (M.-MQ,a~  aQ)  > 

and  h n  and  La  on  the  right-hand  side  are  evaluated  at  (M0,  (XQ)  . 

The  matrix  A”1  always  has  an  inverse  provided  and  ( M  Q,  <rj 

are  not  the  same.  This  follows  by  Theorem  2,  since 

Determinant  (A“^)  *  4  >  0.  (91) 
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Hence 


A  *  A  =  0  . 


Where  A  is  approximately  distributed  as  N(0,A  ),(see  page  34).  We  also 
have  by  another  theorem  from  multivariate  analyses,  [22;  p.3j: 

If  X,  a  row  vector,  is  distributed  according  to 

N  (v ,  V) ,  vAiere  v  is  a  vector  and  V ,  as  usual,  is  a  covariance  matrix.. 
then  the  vector  Y ,  given  by 


Y  =  CXx(for  any  nonsingular  square 

matrix  C)  (9 

is  distributed  according  to  N (C v T, CVC^) . 

Now  for  XT  =  A, we  have  that  v  =  0,  V  =  A-^,  and  for  Y  =  6  ,  C  =  A. 

So,  we  conclude  from  the  above  theorem  (although  details  are  omitted) 

that  d  is  approximately  distributed 

N(0 ,  AA"^)  =  N(0,AT)  =  N (0 ,A)  ,  (S 

T  -1  -IT 

where  we  note  that  A  «  A  since  A  -  (A  )  .  Thus,  for 

0  =  (Ji  -  HQ,a  -  of,  (s 

the  maximum  likelihood  estimators  for  large  N  are  approximately 
bivariately  normally  distributed  about  the  true  parameter  point 
(. UQ,  Oq)  in  the  -plane  with  covariance  matrix  A,  the  inverse  of 
A-^  of  (87)  or  (88; .  We  identify  the  elements  of  A  with  superscripts 


kHa 


kna 


The  elements  A ^  and  k?a  are  called  the  asymptotic  variances  of 
H  and  <t  respectively  and  the  asymptotic  covariance  of  fl  and  0  . 

To  determine  the  confidence  ellipse  for  a  given  confidence  level  Y 
we  consider  the  quadratic  form  occurring  in  the  exponent  of  the  joint 
normal  density  function.  If  an  M-dimensional  vector  variable  X  is 
distributed  according  to  N(v,V) ,  the  joint  density  function  f (X)  is 

f(X)  -  - m72~Ti/2~  k  (X  -  v)  *V-1.(X  -  v)Tl  .  (97) 

(2rr)  |vi  L  2  -I 

The  quadratic  form 

(X  -  v)  •  V-1  •  (X  -  v)T  (98) 


has  a  chi-square  distribution  with  M  degrees  of  freedom  since  X  is 
N(v,  V),  [22;  p.  417].  In  our  case,  the  relevant  quadratic  form  is 


From  (88)  and  (95) 


(99) 


6r  •  A'1  -  [a„m  (M-V0)2  +  2Ajiia  ( M-  M0)(o-  aQ) 

+  (?-  <r0)2]  .  (100) 

The  confidence  ellipse  at  the  confidence  level  Y  ,  say  7  =  0.95, 
is  obtained  by  equating  the  expression  in  brackets  in  (100)  toX^_y 
where  X^  is  obtained  from  a  X2,  table,  [19],  [20]  ,  with  two 
degrees  of  freedom  [22;  p.  417].  In  practice,  the  maximum  likelihood 
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estimates  M,  a  are  available,  the  true  parameters  are  unknown. 

So  the  confidence  ellipse 

kptfL  tM-M)2  +  2kM0  +  A  aa  (#  -  a)2-  x\_Y  (101) 

is  plotted  where  M  and  a  are  running  coordinates  on  the  ellipse 
with  center  at  (H  ,  O).  The  positive  real  number  7  denotes  the 
probability  that  the  true  parameter  point  (/lQ,  <* 0)  lies  in  the 
Interior  of  the  ellipse  in  the  H-o  plane. 

The  coefficients  A ,  A k  are  easily  computed  if 
x^/p^  and  (the  notation  of  Section  II)  are  available,  say  from 

a  computing  procedure  for  the  determination  of  ( fi , a  )  in  which  such 
quantities  as  La  ,  Lp  ,  are  needed.  One  computes,  in  addition,  the 
quantities  (x^/q^)  and  (Yj/Pj)  f°r  each  i  and  j.  Then  in  the  notation 
of  earlier  sections  we  have  for  computation  of  the  coefficients  of 


a 


-2 


n  *i2  m  yj2 

£  FT"  +  L  FT  * 

i»i  piqi  j-i  pjqj 


(102) 


_2 

a  k  no 


(103) 


-2 

O  A 


a  a 


n 

L 

i-l 


s  2a  2  m  t  2y  2 

11  +  l  jJ.K  , 


(104) 


piqi  j-1  pJqj 

where  all  quantities  are  evaluated  at  ( O , fi )  or  equivalently  (  M  *  * )  • 
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Our  computer  program  for  the  determination  of  (/*,<?)  includes 
the  calculation  for  confidence  ellipses  at  V  =  0.50  and  7  *  0.95 
with  the  computer  output  including  the  ellipses  in  plotted  form.  This 
computer  program  is  described  in  the  next  section. 

V.  COMPUTER  PROGRAM 

In  this  section  we  describe  the  actual  computer  program  that  is 
used  to  obtain  H  ,0  and  the  associated  confidence  ellipses  as  discussed 
in  the  previous  section.  In  addition,  those  computations  where  a  loss 
of  significant  digits  may  occur  are  noted  and  their  special  treatment 
is  discussed. 

Two  programs  exist  for  use  on  the  IBM  7030  (STRETCH)  computer. 

One  is  written  completely  in  FORTRAN  IV  and  the  other  only  partially 
with  the  remainder  in  STRAP,  the  STRETCH  machine  language.  The  programs 
are  designed,  as  mentioned  previously,  only  for  the  ordinary  Newton- 
Raphson  (N-R)  procedure,  although  it  wuld  be  easy  to  change  the 
programs  to  accommodate  the  modified  N-R  algorithm  as  described  in 
Section  III.  The  programs  as  they  are  now  set  up  with  the  ordinary  N-R 
procedure  have  always  converged  globally. 

We  proceed  with  some  details  of  the  programs , (Reference  is  made 
to  only  one  program  hereafter  since  the  programs  mentioned  above  differ 
only  in  programming  language.).  It  is  assumed  that  the  input  is  specified 
as  two  sets  of  real  numbers  {bj  }  >  where  i  =  1,  ...,  n,  j  =  l,.*.,m 

At  the  outset  the  program  is  called  upon  to  insure  that  the  necessary 


and  sufficient  conditions  for  the  existence  of  a  unique  point  at 
which  L  attains  a  maximum  are  satisfied,  i.e., 

*min  <  bmax  ’  <105> 

1  m  i  « 

a  *  b1  <  «  L  *i»  <106> 

j-1  J  n  i«i 

where  a  .  “  min^)  and  bmfly  =  max(bj) .  By  Theorem  (4)  of  Section  XII 

if  either  (105)  or  (106)  is  not  satisfied  there  do  not  exist  maximum 
likelihood  estimates  H  ,  a  .  If  this  is  the  case,  an  exit  is  made  in 
the  program. 

If  (105)  and  (106)  are  both  satisfied  by  the  input,  the  program 
proceeds  to  obtain  initial  approximations  to  a,  f} .  We  use 

the  following  relations,  although  as  mentioned  above  a*  and  ft*  can 
actually  be  chosen  arbitrarily, 

a*-  ^/ff*-i(ir.1  +  ixbj)/[a;(x.21  +  £b5)-v2]l/!  (107) 

r  \  i'1/2 

[sb(x*i  +  2‘j)  -v2J  -  <108> 

where 

v  ■  sb  (x*i +  £bj)  •  <l09) 

It  is  understood,  as  before,  that  the  sur-  on  i  run  from  1  to  n  and 
those  on  j  from  1  to  m.  Certainly  better  initial  approximations  could 


have  been  obtained  with  more  extensive  analysis,  however,  since  the 
N-E  algorithm  is  a  second  order  procedure  which  has  always  converged 
globally  for  us,  little  need  was  felt  for  such  refinements . 

If  0*^  *hd  denote  the  approximations  to  Ot  and  , 
respectively,  then  the  (k+l)st  approximations  by  N-R  are  specified  by 


a  =  a  + 

k+1  k 

40tk 

(110) 

£k+l  "  £k  + 

k  =  0,  1,  ...» 

(111) 

where  4c*k  and  4/3k  are  given  by  (55)  with  all  partial  derivatives 
evaluated  at  (Cl  k, ^k) .  The  final  form  for  the  computation  of  the 
quantities  that  appear  in  (55)  are  given  by  (115)  -  (119) .  The 
iterations  are  terminated  when 

|  |  <  €1«k»  |40k|  <  €2y?k  (112) 

are  both  satisfied  for  some  k  ^  1.  The  parameters  €^,  are 
prescribed  as  part  of  the  iaput.  They  are  presently  set  in  the 
program  at 

€1  *  2.5  x  10"4  =  i  .  (113) 

Upon  convergence,  the  program  is  set  to  proceed  with  the 
calculation  of  the  matrix  elements  of  A  which  are  needed  for  the 
confidence  ellipses.  The  equation  for  a  confidence  ellipse  is  given 
by  (101).  The  function  is  known  as  the  chi-squared  distribution 

function  with  two  degrees  of  freedom;  it  is  tabulated  for  various 
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values  of  y ,  [l7,  p.  424],  [l9] .  Values  outside  these  tables  can 
be  determined  from  the  Incomplete  gamma  function,  e.g.[20].  A  few 
commonly  used  values  of  for  two  degrees  of  freedom,  are  listed: 


y 

i-y 

X 

A  i-y 

0.50 

0.50 

1.39 

0.90 

0.10 

4.61 

0.95 

0.05 

5.99 

0.99 

0.01 

9.21 

It  was  shown  in  the  previous  section  that  for  sufficiently  large 

samples  one  can  assert,  with  preassigned  confidence  f ,  that  the  true 

parameter  point  (//Q,  0Q)  lies  somewhere  in  the  interior  of  the  ellipse 

o 

given  by  (101) ,  with  y  chosen  appropriately,  whose  center  is  at 
(//,<7).  The  program  at  present  is  set  to  compute  two  ellipses,  one 
for  y  ■  0.50  and  another  for  f  ■  0.95.  They  appear  in  graphical 
form  as  part  of  the  output  (Examples  are  given  at  the  end  of  this 
section) .  We  remark  that  in  order  to  display  these  ellipses  to  advantage 
somewhat  more  than  an  elementary  plotting  code  was  required.  For 
completeness,  the  details  of  the  plotting  code  are  given  in  Appendix  B. 

The  quantities  A ^  ,  A^  ,  koa  that  are  needed  for  the  ellipses 
in  (101)  can  be  determined  from  (103)  -  (105) .  However ,  for  efficiency 
of  calculation,  as  described  in  the  next  paragraph,  the  actual  equa¬ 
tions  used  in  place  of  (103)  -  (105)  are  (120)  -  (122) . 


For  efficiency,  the  program  is  designed  to  take  advantage  of  the 
situation  when  some  of  the  a^  are  repeated,  and  likewise  for  the  b j . 

The  (and  also  bj)  are  sifted  so  that  only  those  a^  which  are  different 
are  listed  and  with  each  such  a^  an  integer  n(i)  is  also  listed  which 
denotes  the  number  of  times  a^  appears  as  an  input.  For  bj  the  corres¬ 
ponding  integer  is  denoted  by  m(j) .  One  can  then  take  advantage  of  the 
fact  that  the  expressions  for  La,  ,  Laa  ,  L  ^  are  linear 

sums  in  quantities  such  as  (x^/p^)  and  (yj/qj)  so  that  these  quantities 
need  only  be  computed  for  the  different  a±  or  bj  and  multiplied  by  n(i) 
or  m(j) ,  respectively.  With  this  point  of  view,  we  introduce  some 
additional  notation  and  re-write  all  the  pertinent  equations  as  actually 
used  in  the  program. 

Let  the  kttl  different  a^  be  denoted  by  a(k)  and  the  rtl*  different 
bj  by  b(r).  Let  K  and  R  denote  the  total  number  of  different  a^  and  b j , 
respectively,  so  that 

K  R 

,  n  =  Z  n(k),  m  =  £  m(r)  ,  (114) 

k=l  r=l 

where  n,  m  have  their  usual  meaning.  The  basic  equations  for  the  program 
then  can  be  written  as  follows: 

R  K 

La  -  £  m(r)(yr/qr)  -  L  n(k)(xk/p.)  (115) 

r=l  k*l 

K  R 

Lo  “  2)  n(k)a(k)(xk/pk)  -  £  m(r)b(r)(yr/qr )  (116) 

k*i  r*l 
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R 


'  act 


r^l  m^r^yr/qP(yr/qr  "  *r) 


^  »00  Cxk/pk)  (sk-Hsck/pk) 


(117) 


£- 

K 

tf 


t -. 


£ 


I' 


I- 


?' 

? 

r 

t 


i 

i 


aP  *  kfL  n<k>a(kHxk/Pk)  K  +  W  +  nCr)b(r)(yr/qr)(yr/qr-tr) 
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L/9/S  *  '  »(r)b2(r)(yr/qr)(yr/qr.tr) 

K 

-  ^  “(k)a2(k)(xk/pk)  (sk  +  ^/?k) 


(119) 
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2  R 
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where 


kfL  n(k)8k(Xk/pk)  (xk/qk)  +  ^  ®(r)t£(yr/pr)(yr/qr)  #  '122) 


®k  *  a(k)y9  -  a 
tr  *  b(*)y?  -  «  . 

The  total  number  of  different  a.  and  b . ,  K+R,  is  limited  by  the  storage 
capacity  of  STRETCH  to 

K+R  <  10,000. 
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Care  must  be  exercised  in  maintaining  accuracy  throughout  the 
calculations.  For  example,  a  straight-forward  computation  of  the 
quantity  (5ff/p)[s  +  x/p]  ,  for  a  given  a±,  will  not  yield  an  accurate 
result  for  large  negative  values  of  s.  The  difficulty  arises  because 
(x/p)  approaches  (-s)  so  that  the  quantity  in  square  brackets  is  subject 
to  a  loss  of  its  leading  digits.  The  result  may  be  subsequently 
multiplied  by  a  large  quantity  x/p  (or  ai(x/p),  a^(x/p))  thus  leading 
to  a  large  error.  This  difficulty  is  easy  to  remedy  by  simply  replacing 
(x/p)  by  an  asymptotic  expansion  whose  leading  term  is  (-s)  and  replacing 
the  quantity  in  square  brackets  by  this  asymptotic  expansion  with  the 
first  term  (-s)  removed..  Similar  care  must  be  taken  with  the  quantity 
(y/q) (y/q-t) .  Losses  in  accuracy  of  this  nature  are  one  possible 
cause  for  the  reported  divergence  of  the  N-R  algorithm.  Another  may 
be  that  most  subroutines  for  computing  probability  Integrals  retain 
very  little  accuracy  over  some  parts  of  the  domain  (-00,00).  For  our 
program, extensive  efforts  were  made  to  compute  probability  integrals 
to  high  accuracy  over  the  entire  domain.  Originally,  a  table  of 
probability  integrals  was  stored  at  equal  increments  in  the  argument 
of  .0025  for  the  interval  [-8,8], and  two  asymptotic  series  in  inverse 
powers  of  the  argument,  each  containing  27  terms,  were  used  to  evaluate 
(x/p)  and  (s+x/p)  for  arguments  larger  in  absolute  value  than  8.  In 
this  Way,  all  individual  terms  were  computed  to  an  accuracy  of  nearly 
13  significant  digits  on  STRETCH  which  uses  14  digits.  Recently,  however, 
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a  method  for  computing  p(s)  (and  q(s))  was  published  by  Cody,  C5] 
which  gives,  on  STRETCH,  at  least  twelve  significant  digits  for  all 
sc  (>00,00).  Cody's  method  was  used  as  a  basis  for  subroutines,  after 
suitable  modifications  for  the  computation  of  s  +  x/p  for  large  values 
of  s,  which  supplanted  the  table  and  two  asymptotic  series  mentioned 
above.  This  resulted  in  a  large  reduction  in  storage  requirements  with 
no  significant  loss  in  accuracy  or  computing  speed. 

The  running  time  for  the  STRETCH  Fortran  IV  program  with  the 
tolerances  chosen  as  in  (113)  averages  about  .007  seconds  per  every 
different  item  of  input,  i.e.,  a(k)  or  b(r) .  Thus  the  average  computing 
time  per  case  is  about  .007(K+R).  The  average  computing  time  for  the 
other  program  which  uses  some  STRAP  language  is  about  ,0055(K+R).  These 
rough  estimates  do  not  include  the  time  for  plotting  the  confidence 
ellipses  which  are  done  off-line. 

The  output  as  generated  off-line  from  a  STRETCH  tape  is  shown  for 
14  cases  at  the  end  of  this  section.  Each  output  sheet  lists,  starting 
in  the  upper  left-hand  corner,  an  identification  number,  e.g.  No.  111B, 
followed  down  the  page  by  a  listing  of  the  different  a^  and  bj  and  the 
number  of  times  each  occurs ,  e.g. , in  Case  1-3-17-70, (p. 61),  a^  =  -4  occurs 
6  times  as  input  and  b(3)  =  -5.5  occurs  4  times.  Toward  the  top  center 
the  maximum  likelihood  estimates  fi  ,  O  are  identified  as  mu  and  sigma, 
respectively.  Below  these  quantities,  the  elements  of  the  covariance 
matrix  are  recorded  followed  by  the  starting  values  and  H* ,  a* 

which  are  identified  by  alpha*,  beta*,  mu*,  sigma*,  respectively.  Then, 
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a  listing  of  the  Newton-Raphson  increments,  for  each  successive  iteration, 
Ad  »  Afi  *  written  as  delta  alpha,  delta  beta,  are  given  as  well  as 
the  associated  value  of  F  (The  column  is  indicated  incorrectly  with 
L  instead  of  F  where  F  is  given  by  (2)).  Finally  the  two  confidence 
ellipses  are  shown,  at  the  9J%  and  50%  confidence  levels,  in  thf. 

H a -plane.  These  ellipses  are  constrained  to  a  circumscribed  square. 

The  plotting  was  carried  out  so  that  the  two  axes  of  the  ellipses  lie 
on  the  diagonals  of  the  square.  The  details  of  this  graphical  construc¬ 
tion  are  given  in  Appendix  B. 

The  various  cases  which  are  used  as  examples  are  drawn  from  shell 
penetration  tests  or  biological  experiments.  In  a  number  of  the  cases, 
the  confidence  ellipses  include  regions  where  a  is  negative.  This 
can  probably  be  interpreted  to  mean  that  the  sample  size  is  too  small 
for  approximating  a  "large"  sample.  It  is  recalled  from  the  previous 
section  that  the  analysis  for  confidence  ellipses  was  based  on  the 
hypothesis  that  N,  the  sample  size,  approached  infinity. 

Some  cases  are  duplicated  to  give  emphasis  to  the  fact  that  N-R 
appears  to  converge  globally.  In  particular,  case  No.  2A  was  reported 
in  [2lJ  to  diverge.  The  results  of  case  2A  are  shown  for  four  different 
sets  of  starting  values.  These  results  clearly  show  ;oi»vergence . 

Case  No.  3  was  taken  from  £l4]  .  No.  Ill  and  117  are  taken  from 
NWL  shell  penetration  tests.  No.  1  is  taken  from  [18}.  Our  results 
agree  very  closely  with  theirs.  Case  Jan.  19;  1970,  obtained  from  £4], 
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was  used  to  compare  our  results  with  those  obtained  from  Finney's 
probit  analysis.  (Appendix  C  is  devoted  to  a  discussion  of  probit 
analysis  as  an  alternative  method  for  obtaining  estimates  of  fiQ  And  <tq.) 
Case  No.  386  was  supplied  by  Dr.  Marlin  Thomas.  The  confidence  ellipses 
for  this  case  also  suggest  that  there  is  insufficient  data  for  the 
asymptotic  analysis  of  the  previous  section  to  apply.  The  remaining 
cases  are  included  to  give  further  indications  of  the  global  convergence 
properties  of  the  N-R  algorithm. 


DISTANCE  BETWEEN  MU  TICK  MARKS  . 2E-00 


.E‘r*41»K*0«  SIOMAr  .«W#ielE«02 


COVARIANCE  MATRIX  .8«6H6«E»06  .2«2343E+ce 


DISTANCE  BETWEEN  MU  TICK  MARKS  1 .0E«02 


DISTANCE  BETWEEN  610  TICK  MARKS 
.28*01 


-.vr'Gft  ><  *~n  fa* 


VI.  REFERENCES 


[1]  Anderson,  T.  W. ,  An  Introduction  to  Multivariate  Statistical 
Analysis.  John  Wiley  and  Sons,  Inc.,  New  York,  1958. 

[2]  Apostol,  T.  M. ,  Mathematical  Analysis.  Addison-Wesley  Publishing 
Co.,  Inc.,  Mass.  1958. 

[3]  Brownlee,  K.  A.,  Hodges,  J.  L.,  Jr.,  and  Rosenblatt,  M. ,  The 
Up  and  Down  Method  with  Small  Samples.  J.  Amer.  Statist.  Assoc. 

48,  1953,  p.  262 

[4]  Ciuchta,  H.  P.,  Vick,  J.  A.,  &nd  Manthei,  J.  H.,  Dose-Response 
Relationship  of  Crude  Cobra  Venom  (Naia  naja)  in  the  Dog. 

Edgewood  Arsenal  Technical  Report  4074,  Medical  Research  Laboratory, 
Research  Laboratories,  Edgewood  Arsenal,  Maryland,  21010,  March  1967. 

[5]  Cody,  W.  J.,  Rational  Chebvshev  Approximations  for  the  Error 

v 

Function.  Math.  Comp.,  23,  1969,  pp.  631-637 

\ 

[6]  Cornfield,  J.  and  Mantel,  N. ,  Some  New  Aspects  of  the  Application 
of  Maximum  Likelihood  to  the  Calculation  of  the  Dosage  Response 
Curve,  J.  Amer.  Statist.  Assoc.,  45,  June  1950,  pp.  181-210. 

[7]  Cramer,  H. ,  Mathematical  Methods  of  Statistics,  Princeton 
University  Press,  Princeton,  New  Jersey,  1946. 

[8]  Dixon,  W.  J.  and  Mood,  A.  M. ,  A  Method  for  Obtaining  and  Analyzing 
Sensitivity  Data.  J.  Amer.  Statist.  Assoc.,  43.,  1948,  pp.  109-126. 

[9]  Epstein,  B.  and  Churchman,  C.W. ,  On  Statistics  of  Sensitivity  Data. 
Ann.  Math.  Statist,  15,  1944,  pp.  90-96. 

63 


V~-  a  >'li a 


[10]  Finney,  D.  J.,  Probit  Analysis.  Cambridge,  At  the  University 
Press,  Second  Edition,  1952. 


f 

l 


e,-'  ■ 

i 


% 


! 

! 

i 

I 

? 

1 

\ 

\ 

! 


t 

! 

i 


[11]  Finney,  D.  J.,  and  Stevens,  W.  L. ,  A  Table  for  the  Calculation 
of  Working  Probits  and  Weights  in  Probit  Analysis.  Biometrika, 

35,  1948,  pp.  191-201. 

[12]  Fisher,  R.  A.,  On  an  Absolute  Criterion  for  Fitting  Frequency 
Curves,  Mess,  of  Math.,  41.,  1912,  p.  155. 

[13]  Garwood,  F.,  The  Application  of  Maximum  Likelihood  to  Dosage- 
Mortalitv  Curves.  Biometrika,  33,  1941,  pp.  46-58. 

[14]  Golub,  A.  and  Grubbs,  F.,  Analysis  of  Sensitivity  Experiments 
when  the  Levels  of  Stimulus  Cannot  be  Controlled,  J.  Amer.  Statist. 
Assoc.,  51,  1956,  pp.  257-265. 

[15]  Hildebrand,  F.  B.,  Introduction  to  Numerical.  Analysis,  McGraw-Hill 
Book  Co.,  Inc.,  New  York,  1956. 

[16]  Kendall,  M.  G. ,  and  Stuart,  A.,  Advanced  Theory  of  Statistics. 

3  vols,  Charles  Griffin  and  Co.,  Ltd.,  London,  1958-1969. 

[17]  Mood,  A.  M. ,  Introduction  to  the  Theory  of  Statistics,  McGraw- 
Hill  Book  Co.,  Inc.  New  York,  1950. 

[18]  Moore,  R.  H. ,  and  Zeigler,  R.  K. ,  Using  Nonlinear  Least-Squares 
Methods  for  Quantal  Response  and  Sensitivity  Analyses,  Minimum 
Chi-Square  Estimation,  and  Differencial  Equations.  Los  Alamos 
Scientific  Laboratory  of  the  University  of  California  #LA-3763, 

Los  alamos.  New  Mexico,  1967. 


[19]  Pearson,  E.  S.  and  Hartley,  0,  H.,  (Editors),  Biometrika  Tables 


for  Statisticians.  Vol.  1,  Cambridge  Univ.  Press,  Cambridge, 
England,  1954,  pp.  130-131. 

[20]  Pearson,  K.  (Editor),  Tables  of  the  Incomplete  Gamma  Function. 
Biometrika  Office,  University  College,  Cambridge  Univ.  Press, 
Cambridge,  England,  1934. 

[21]  Rothman,  D.,  Alexander,  M.  J. ,  and  Zimmerman,  J.  M. ,  Maximum 
Likelihood  Estimates  of  Cumulative  Normal  Response  Functions 
Program,  North  American  Aviation  Report  #NAR50626,  Rocketdyne 
Division,  Canoga  Park,  California,  1966. 


[22]  Scheffd',  K.,  The  Analysis  of  Variance.  John  Wiley  and  Sons,  Inc 
New  York,  1963. 


APPENDIX  A 


COMPLETION  OF  PROOF  FOR  THEOREM  5. 

It  remains  to  prove  that  the  sequence  of  points  in 

={(<*£.,  /3^)  }  >  which  is  generated  by  the  modified  N-R  procedure 
(See  pages  23-26)  actually  converges  to  Q  =  (<*,y3),  the  point  for  which 
L(Q)  =  L(CI,0)  takes  its  maximum  value,  L.  It  will  be  helpful  in  the 
analysis  belcw  to  think  of  a  point  QCE^  as  a  vector  in  which  has 
the  component  form  (at/5). 

In  the  discussion  of  Theorem  5,  in  the  main  text  of  this  report, 
a  parameter  u^  was  associated  with  each  modified  N-R  iterate,  Q^.  The 
parameters  u^  are  positive  real  numbers  defined  by  (60)  and  (61)  such 
that  for  any  ^  Q 

.  0  <  u1+1  ^  1,  (123) 

L(Qi+l)  >  L(Q1),  for  i  =  1,  2,  ...,  (124) 

where  Is  generated  by  the  vector  relation 

Vi  -  «t+  “i+1  •  <125) 

with 

[4Qt]  *  Mat,  Aflj).  (126) 

The  quantities  Aoi^  and  A{$  ^  are  the  ordinary  N-R  increments  which  are 
obtained  from  (55)  with  all  the  derivatives  which  appear  on  the  right 
hand  side  of  (55)  evaluated  at  =  (01^,  .  A  positive  parameter, 

hj,  is  associated  with  each  consecutive  pair  of  elements,  {  Qj>  } 

of  the  sequence  {Q^}  which  is  defined  by  the  relation 
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hj  S  L(Qj+1)  -  L(Q.)  >0,  j  =  1,  2 .  (127) 

The  hj  are  positive  for  every  j.  This  is  assured  by  (59)  which  implies 
there  always  exists  a  uj»  as  determined  by  (60)  and  (61),  so  that  (123) 
and  (124)  hold  for  every  j. 

Clearly  the  procedure  described  can  be  applied  to  any  point  P  € 
(except  with  a  new  point  generated  by  (125)  and  (126)  with 

an  associated  positive  increase  in  L  indicated  by  h  as  defined  by  (127). 

For  easy  reference,  a  result  which  we  need  is  stated  in  the  form  of 
a  lemma. 

LEMMA  (A-l)  -  The  sequence  { h^ }  generated  by  the  modified  N-R  procedure 

converges  to  zero,  i.e. 

lim  h.  =  0.  (128) 

i  ■>  oo 

Proof:  We  use  the  simpler  notation  for  L(Q^)  =  L(ct^,;8^). 

Then,  for  k  ^  1, 

Lk+l  =  Lk+hk  =  Ll  +  hl  +  h2+”'  +  hk<L 

OO 

Hence,  the  infinite  series  £  h.  of  oositive  terms  is  convergent, 

i*l  1 

which  implies  (128).  Q.E.D. 

LEMMA  (A- 2) .  Given  an  arbitrary  positive  real  number  K,  there  exists  a 
quadrilateral  in  the  C&fi -plane,  with  the  origin  in  its  interior,  such 
that  L(ct  ■  l>  <  -*  for  every  point  (a.*  ft)  which  is  exterior  to  the 
quadrilateral . 
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Proof:  Tue  construction  used  here  will  be  similar  to  that  used 

follo'fing  Lemma  7  (See  page  20).  Indeed,  we  can  always  construct  a 

triangle  here  with  the  desired  properties  if  it  is  possible  to  find 

three  subscripts  r,  s,  t  such  that  ar  <  bg  <  afc,  or  such  that 

b  <  a  <  b  .  But  this  cannot  always  be  done,  for  instance  if  we 
rat 

have  a^  *  b^  =  b^  <  a^  ”  b^,  and  if  we  have  only  these  five  stimuli. 
But,  in  cases  of  interest,  we  always  have  and  b^  <  a^,  by 

(35),  and  the  four  stimuli  appearing  in  these  inequalities  form  the 
basis  of  the  quadrilateral  construction. 

Let  the  positive  constant  K  of  the  lemma  be  assigned.  We  determine 
a  positive  number  s  such  that  p(-s)  =  exp  (-K-1).  Such  a  number  s 
exists  because  of  the  monotonic  increase  of  p(t)  from  0  to  1  on  the 
interval  (-00,00).  Now  consider  the  line  a^  p-a  -  -s  in  the 
a /3 -plane.  This  is  denoted  as  line  (J)  in  Figure  2.  At  every 
point  of  line  Q)  we  have  p( a^/?  -Ct)  =  p(-s)  =  exp(-K-l),  or 
log  p(a^^3  -  ct)  =  -K-l  <  -K.  But  since,  by  (5),  L  can  be  expressed 
as  a  sum  of  negative  terms,  we  have,  for  every  (0t,y8)  on  line  (p  , 
L(o,y3)  <  log  p^  =  log(a^j3  -Ot)  =  log  p(-s)  =  -K-l  <  -K.  This  is 
similar  to  the  corresponding  analysis  on  pages  20-22.  The 

Ot-intercept  of  line  ©  ,  a^/3  -  Ct  *  -s,  is  Ot  =  +s. 

Now  suppose  ( ot  *  /3 )  is  a  point  to  the  right  of  line  ©  .  By  this 
we  mean  that,  if  (ctQ,^)  is  the  point  of  line  ©  with  the  same 
ordinate  fi  ,  then  <t  >  ct0-  The  phrase  "to  the  left  of"  will  have 
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an  analogous  meaning.  A  line  of  the  form  a^/S  -Ct  =  c  or  b^/f-Ot  =  c 
cannot  be  horizontal,  since  every  a^  or  is  finite.  The  point  (dt0) 
to  the  right  of  line  Q  will  lie  on  a  line  of  the  form  a^0  -  d  -  -s-e 
with  d -intercept  a  =  sfe,  where  e  >  0,  and  for  this  point  the  log  p^ 
term  will  be  log  p  (-s-e)  <  log  p  (-s)  =  -K-l  <  -K,  and  so,  for  this 
point,  we  have,  as  above,  L(ctf^J)  <  -  K. 

Thus,  for  every  point  in  the  d0 -plane  which  is  on  or  to  the 
right  of  line  Q  ,  we  have  L(ct  ,0)  <  -K. 

Similarly,  designating  the  line  b ^0-d  =  +s  with  Ct-intercept 
-s,  as  line  ©  (see  Fig.  2),  we  find  that,  for  every  point  on  line  ©  , 
V  =  q(bm/3-a  )  =  q(+s)  =  p(-s)  =  exp ( -K-l) ,  since  the  identity 
q(t)  -  p(-t)  holds  for  all  real  t.  Hence  we  have,  as  above,  L(d,0) 

<  log  q^  =  -K-l  <  -K.  Also,  any  point  (dt0)  which  is  to  the  left 
of  line  (5)  is  on  some  line  b  /S-Ct  =  +s+e  with  Ct-intercept  -s-e 
where  e  >  0,  and  for  such  a  point  the  log  term  will  be 
log  q  (s+e)  <  log  q  (+s)  =  -K-l  <  -K,  since  the  function  q(t),  or 
l~p(t),  decreases  as  t  increases.  Therefore,  for  such  a  point  ( d,0 ), 
to  the  left  of  line  @  ,  we  have  as  above,  L(d,0)  <  -K.  Thus, 
for  every  point  in  the  d0  -plane  which  is  on  or  to  the  left  of  line  ©  , 
we  have  L(tt,/?)  <  -K. 

These  lines  ©  and  ©  ,  with  equations  a^0  - d  =  -s  and 

b  B -  Ct  *  +s  respectively,  intersect,  by  analytic  geometry,  at  the 
m 

point  a  =  (b  +  a.)  s/(b  -  a.),  B  =  2s/ (b  -a-).  Since  s  >  0,  and 
mi  ml  mi 

b  -  a.  >  0  (since  a,  <  b  ),  this  point,  designated  as  C  in  Figure  2, 
ml  l  m 

is  in  the  upper  half-plane,  with  0  >  0. 
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We  next  make  a  very  similar  analysis  using  lines  ©  anc*  ©  » 

Fig.  2,  where  line  (3)  by  definition  is  the  line  a^fi  -  Ct  =  -s  and 

line  (?)  is  b^/3  -  Ot  =  +s.  These  lines  intersect  at  the  point  marked  I) 

in  Fig.  2,  with  coordinates  Ct  =  -  (a  +  b, )s/(a  -  b,),  8  *  -2s/ (a  -  b  ). 

nini  ni 

Since  s  >  0  and  a^  -  b^  >0  (since  b^  <  an  by  (35)),  j3  <  0,  or 
the  point  D  is  in  the  lower  half-plane.  It  is  true,  as  in  the  foregoing 
analysis,  that,  for  every  point  (Ct,/ 5?)  on  or  to  the  right  of  line  (3)  , 
we  have  L(ot,y5)  <  -K;  and  that,  for  every  point  (ot,/S)  on  or  to  the 
left  of  line  (?)  ,  we  also  have  L(ct,/S)  <  -K.  We  omit  the  details 
here  as  the  proofs  exactly  parallel  those  for  lines  (l)  and  (2)  . 

Thus  we  obtain  the  quadrilateral  ACBD  of  Fig.  2,  where  A  and  B 
as  shown  in  the  figure  are  the  points  (s,0)  and  (-s,0),  and  C  and  D, 
in  the  upper  and  lower  half-planes  respectively,  are  the  points  whose 
coordinates  have  been  given.  This  quadrilateral  reduces  to  a  triangle 
in  some  cases,  for  instance  when  <  a^  <  with  a^  >  (b^  +  b^)/2 
and  there  are  no  other  stimuli,  so  that  a^,  -  a^  and  lines  (JL)  and  (3) 
coincide.  But,  even  in  such  cases,  A(s,0)  and  B(-s,0)  are  as  shown  in 
Fig.  2,  and  C  and  D  are  in  the  upper  and  lower  half-planes  respectively. 

We  can  in  every  case  refer  to  the  quadrilateral  ACBD  with  the  under¬ 
standing  that,  in  some  cases,  A  (or  B)  may  be  collinear  with  C  and  D. 

It  is  now  not  difficult  to  see,  on  the  basis  of  the  foregoing 
analysis,  that,  for  every  point  (Cf,/3)  in  the  plane  which  is  exterior 
to  this  quadrilateral,  we  have  L ( ot  ,  /3  )  <  -K,  as  stated  in  Lemma  (A-2). 
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The  reader  can  easily  convince  himself,  by  making  a  copy  of  Fig.  2 
and  shading  those  parts  of  the  plane  where  the  relation  L(a  ,/9)  <  -K 
holds,  that  every  such  exterior  point  (ot,£)  satisfies  at  least  one 
of  the  following  conditions:  it  lies  (1)  to  the  right  of  line  Q)  , 
or  (2)  to  the  left  of  line  (2),  or  (3)  to  the  right  of  line  (5), 
or  (4)  to  the  left  of  line  @ •  Any  one  of  these  geometrical  conditions 
is  sufficient  to  ensure  that  the  relation  <  ~K  holds.  This 

completes  the  proof  of  Lemma  (A-2) . 

Now  we  define  a  point  set  M  in  the  -plane  by 

M  =  {Q  |  L(Q)  >  Lx}  , 

where  is  determined  by  choosing  a  starting  point  (Ct^,  /3^)  = 
for  the  modified  N-R  procedure.  Hie  symbol  Q  here,  however,  refers 
to  any  point  in  the  plane  satisfying  the  inequality  L(Q)  ^  L^,  whether 
a  member  of  any  particular  sequence  of  N-R  iterates  or  not. 

LEMMA  (A-3) .  The  point  set  M  is  closed  and  bounded  in  E^. 

Proof:  M  is  bounded  by  Lemma  (A-2),  since  one  can  take  L^  as  the 
negative  number  -K  of  the  lemma,  and  construct  a  quadrilateral  such 
that  L(Ot  ,0)  <  L^  for  every  point  which  is  exterior  to  the  quadri¬ 
lateral.  Hence  the  set  M  is  contained  in  the  quadrilateral  plus  its 
interior,  i.e.  in  a  bounded  subset  of  the  plane. 

The  set  M  is  closed,  because  if  a  sequence  of  points  in  M  converges 
to  a  point  Q1 ,  we  must  have  L(Q')  ^  L^,  since  L  is  a  continuous  function 
of  its  arguments  ct  and  .  Hence  Q' 6  M  by  the  definition  of  M  and  it 
has  been  shown  that  M  is  closed.  Q.E.D. 
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Now  consider  an  infinite  sequence  {Qj}  of  distinct  modified  N-R 

iterates  with  a  starting  point  Q. .  The  fact  that  all  of  the  Q4  are 

distinct  follows  from  the  fact  that  L  is  increased  at  every  step. 

For  all  j,  Q ^  €  M.  Hence  by  the  Bolzano-Weierstrass  theorem  } 

has  at  least  one  accumulation  point,  say  Q.  Since  M  is  closed,  Q€M. 

But  Q  need  not  be  a  member  of  the  sequence  {Qj}*  However,  the  modified 

N-R  procedure  can  be  applied  at  Q  (or  any  other  point),  from  which 

an  associated  h  would  be  obtained.  The  quantity  h  >  0  unless  Q  -  Q, 

in  which  case  h  =  h  =  0.  This  is  easy  to  see  because  if  Q  =  Q,  then 

from  (55),  since  La  (Q)  =  lyj(Q)  =  0,  Act  =  Af3  =  0,  so  that 

4Q  =  0  in  (125)  and  Q  is  not  changed  by  the  procedure.  Hence  n  =  0. 

So  we  assume  Q  ^  Q,  consequently  h  >  0.  It  is  always  possible 

to  choose  a  subsequence  {Q,  }  of  {Q, }  such  that  Q.  ->*  Q.  Hereafter, 

j  w  J  J  w 

for  typographical  convenience,  we  note  the  elements  of  the  subsequence 
by  where  it  is  understood  w  takes  the  integer  values  We  will 

show  the  assumption  h  >  0  leads  to  a  contradiction  of  Lemma  (A-l). 

The  conclusion  will  follow  that  Q  =  Q,  and  that  the  entire  original 
sequence  {Qj}  converges  to  Q. 

A  visual  aid  which  we  call  an  overlaid  diagram  will  be  used  to 
elucidate  the  remainder  of  the  proof;  it  is  briefly  defined  and 
illustrated.  Pass  a  vertical  plane  V  through  Q  and  the  point  Q  +  u4Q 
(a  +  u4fl  ,  p  +  uAf})  into  which  Q  is  transformed  by  the  modified 
N-R  process.  The  intersection  of  V  and  the  L(a  , ft )  surface  is  a 


concave  downward  curve  since  the  surface  itself  is  concave  downward 


everywhere.  This  curve  is  indicated  as  the  solid  curve  in  Figure  3 
L 


V  plane 


Figure  3  \ 


with  Q  at  the  origin.  Points  on  the  horizontal  axis  are  at  a  height 
|L(Q)j  below  (because  L  <  0)  the  Ct£ -plane.  Points  on  the  solid 
curve  can  be  specified  by  appropriate  values  of  the  parameter  u.  Now 
consider  a  point  of  the  subsequence,  C^,  which  for  large  w  is  very 
near  Q.  Since  L  and  all  of  its  derivatives  are  continuous,  the 
corresponding  curve  determined  by  and  appropriate  values  of  u' 

(we  use  u'  here  instead  of  u  to  denote  a  distinction  from  the  parameter 
u  associated  with  Q),  in  general,  lies  in  a  slightly  different  vertical 
plane  from  V,  but  will  nevertheless  be  very  near  the  Q  curve  by  continuity. 
Now  we  think  of  the  curve  as  translated  in  its  plane  until  the 
projection  of  the  point  on  the  V  plane  falls  exactly  on  the  point  Q. 

The  translated  Q  curve  is  then  projected  on  to  the  V  plane  and  is 


shown  as  the  dashed  line  in  the  figure.  The  distances  in  the  AtyJ-plane 
corresponding  to  u  »  1  and  u'  =  1  are  not  exactly  equal  in  general,  but 
are  nearly  equal  for  large  w  by  continuity.  Figure  3  will  be  referred 
to  as  the  overlaid  diagram.  The  remaining  arguments  will  assume  the 
curves  are  in  the  V  plane  since  the  actual  curves  and  their 
translated  projections  on  the  V  plane  can  be  made  to  differ  by  as 
little  as  we  desire  by  choosing  w  sufficiently  large. 

There  are  two  situations  to  consider  which  are  distinguished  by 
whether  or  not  there  exists  a  non-negative  integer  k  for  which  the 


quantity 


K  2=  L(Q  +  u4Q)  -  L(Q) 


is  zero,  where  u  *  2  ,  and  as  usual  4Q  25  (  Act,  Ap)  with  Act »  Ap 

the  ordinary  N-R  increments  at  Q  as  obtained  from  (55). 

If  K  ^  0  for  any  k  ^  0,  then  no  difficulty  arises  because  it  is 
easy  to  argue  from  continuity  that  the  subsequence  {hw}  converges  to  h. 
However  by  Lemma  (A-l),  h  =  0,  which  obviously  implies  Q  =  Q  by  the 
arguments  above. 

If  K  =  0  for  some  k,  say  for  definiteness  k  =  l(u  =  %)  a  more 
subtle  argument  is  required  to  show  h  =  0.  There  are  two  possibilities 
to  consider  for  the  subsequence  of  iterates  {(^  } 

(a)  All  but  a  finite  number  of  the  have  the  property  that 

WQ»+  I  4V  '  UV  >  0  •  <129) 
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(130) 


(b)  An  infinite  number  of  the  have  the  property  that 

U.%  +  \  -  L<V  <  0  , 

(This  does  not  exclude  the  possibility  that  an  infinite  number  of  the 
may  also  satisfy  (129)). 

The  overlaid  diagram  is  useful  here  to  visualize  these  situations 
keeping  in  mind  that,  for  sufficiently  large  w,  the  points  AQ^ 

are  arbitrarily  near  Q  +  j  4Q. 

For  case  (a) ,  the  arguments  are  essentially  the  same  as  those 
used  above  for  K  ^  0.  Briefly,  since  (129)  holds  for  all  w,  when  w 
is  sufficiently  large. 


0  <  h  -  lim  [UQ^  +  l  A%)  -  U%) ]  =  Hm  (hw). 


(131) 


W->eo 


W->»oo 


where  the  factor  (  ^  )  occurs  because  K  =  0  f^r  k  =  1  so  that  by 

1 


(60)  and  (61)  we  require  r  =  k  +  1  where  u  =  2 


-r 


4  * 


Again  we 


arrive  at  a  contradiction  by  employing  Lemma  (A-l) ,  since  it  requires 
lim  h  =  0.  We  note  the  first  equality  in  (131)  must  hold  because 

w-9*oo  W 

converges  to  Q,  L  is  continuous  in  Q  and 

h  =  [L(Q  +  \  4Q)  -  L(Q)]  .  (132) 


The  second  equality  must  hold  for  (131),  because,  for  sufficiently 
large  w,  the  quantity 

+  'l<V 

approaches  zero  (since  K  =  0) ,  so  that  the  factor  %  must  eventually 
be  changed  to  %  as  w  increases. 
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For  case  (b),  we  assume  that  (130)  holds  for  an  infinite  number 


of  elements  of  •  We  identify  this  subsequence  by  where 

z  takes  the  integer  values  w  .  By  the  modified  N-R  procedure,  the  h 

z  z 

for  each  element  in  { Q  \  will  be  the  larger  of  L(Q  +  y  A  Q  )  -  L(Q  ) 

v.  Z  J  Z  Z  '  Z 

and  L(Q  +  r-  4q  )  -  L(Q  ).  If  the  latter  is  the  larger  of  the  two, 
z  o  z  z 

then  {hz}  will  converge  to  L(Q  +  ^  A Q)  -  L(Q).  So  that  by  (132) 
and  Lemma  (A-l) , 


0  <  h  <  lim  h  =  0  , 

Z->-  oc  Z 


which  again  leads  to  the  desired  contradiction.  Certainly  if 

L(Q  +  |  A  Q)  -  L(Q)  >  L(Q  +  |  A  Q)  -  L(Q)  ,  (133) 

then  for  sufficiently  large  w  the  above  situation  will  hold.  On  the 
other  hand,  if  this  inequality  is  reversed  or  equality  would  hold, 
then  {hz}  would  converge  to  h  for  which  a  contradiction  will  again 
follow. 

The  situations  described  by  cases  (a)  and  (b)  exhaust  the  possi¬ 
bilities  of  what  may  occur  for  the  subsequence  {c^}  .  In  every  case 
we  have  shown  h  =  0,  which  implies  Q  =  Q.  By  elementary  analysis  it 
will  now  follow  that  the  entire  original  sequence  {  Q ^  }  must  converge 
to  the  unique  solution  point,  Q.  Indeed,  suppose  this  is  not  the  case. 
Then  there  exists  an  open  circular  region  R,  with  center  at  Q,  such 

that  an  infinite  number  of  Q.  lie  outside  R  and  hence  in  the  closed 

3 
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APPENDIX  B 


ANALYSIS  FOR  GRAPHS  OF  CONFIDENCE  ELLIPSES 
The  equation  of  the  confidence  ellipse  for  a  given  confidence 
level  1  -  7  (for  example,  y  =  .05  for  95%  confidence)  is 

(H-M  )2  +  2A  fja  (m  -o)  +  ka<J  (a  -  a)2  »  D=  (134) 

where  ( )  is  the  computed  maximum  likelihood  estimate  for  the 
position  of  the  true  parameter  point,  (  fi  q,  i7q)  in  the  fl  a  -  plane. 

The  quantity  D(or  X ^  )  is  the  value  obtained  from  a  chi-square  table 

at  the  y  level,  for  two  degrees  of  freedom  and  A^  ,  A^  ,  A  a<5  are 
the  elements  of  the  inverse  of  the  covariance  matrix  which  is  determined 
by  the  main  program.  The  equations  for  A  /j^  ,  A  fj_a  ,  A  ^  a  are  given 
as  equations  (102),  (103),  (104),  or  as  actually  used  in  the  computer 
program,  they  are  given  by  (120),  (121),  (122).  The  values  of  D  for 
95%  and  50%  confidence,  the  values  at  which  the  NWL  program  is  presently 
set,  are  5.99  and  1.39,  respectively,  (See  p.  42). 

We  emphasize  here  the  necessity  of  distinguishing  tatween  the 
coefficients  A i  ,  A^j  of  (134)  and  the  elements  A ^ ^  , 

A° a  of  the  covariance  matrix  A.  Let  E  denote  the  matrix  associated 
with  the  ellipse,  so  that 


where 


A/l<7  \ 

f 

(135) 

HO 

A  O  0  ) 
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(136) 


A  * 


A**\ 

) ( C o var ianc e  matrix). 

k°°  1 


The  elements  of  E  are  computed  in  the  main  program  and  those  of  A  by 
numerical  inversion  of  E.  Hence,  it  is  not  necessary  to  reinvert  A 
(the  elements  of  which  are  printed  out  by  the  program)  in  order  to 
compute  E. 

We  simplify  the  notation  by  writing  (134)  in  the  form 

ax^  +  2bxy  +  cy^  =  D  »  fl37) 

where 


a  =  ,  b  =  k pa  ,  c  =  kaa  ,  x  =  (M-M),  y  -  (O-O).  (138) 


From  (102)  and  (104),  clearly  a  and  c  are  positive,  and  it  can  be  shown 
by  the  methods  used  in  Theorem  2  that 


A  =  ac  -  b2  >  0.  (139) 

Thus,  (134)  represents  an  ellipse  for  any  positive  value  of  D. 

Our  first  objective  is  to  establish  the  points  of  maximum  and 
minimum  ordinates  and  maximum  and  minimum  abscissas  on  the  ellipse. 

This  is  easily  accomplished  by  equating  successively  to  zero  (dy/dx) 
and  (dx/dy)  as  determined  from  (137).  Substituting  the  linear  relation 

y  =  -ax/b,  (140) 

which  remains  into  (137),  one  obtains 


2 

x 


b^D 
a  A  ’ 


(141) 


30 


and  from  (140) 


(142) 


y  =T  * 

In  this  way,  two  points  are  determined  (not  four)  from  (140),  (141), 

(142)  with  the  possibility  b  may  be  of  either  sign.  If  b  »  0,  (137) 
represents  an  ellipse  with  vertical  and  horizontal  axes  and  the  points 
of  maximum  and  minimum  ordinates  are  (0,  i  j/d/c  ). 

From  (142)  we  see  that 

1  /  -'T> 

S(y)  =  Span  in  ordinates  -  2 1/  -pp  ,  (143) 

where  S(y)  denotes  difference  between  the  extreme  ordinates.  By 
setting  b  =  0  in  (143) ,  we  see  that  (143)  reduces  to  2|^D/c'.  Thus 

(143)  gives  the  correct  result  for  all  b. 

The  span  in  abscissas,  S(x),  is  determined  by  differentiating 
(137)  with  respect  to  y  and  setting  x’ (=  dx/dy)  =  0.  It  will  follow, 
similar  to  the  case  of  S(y),  that 

S(x)  *  2  1®  .  (144) 

A 

The  ellipse  represented  by  (137)  for  a  given  D  may  be  very  slender 
and  elongated  as  well  as  being  too  large  or  too  small  for  convenient 
plotting,  if  no  coordinate  scaling  is  done,  lienee,  it  is  desirable  to 
scale  the  coordinates  in  such  a  way  as  to  avoid  these  undesirable 
characteristics  as  much  as  possible  in  the  machine  plotting. 


Our  next  objective  is  to  derive  transformations  which  take  the 
ellipse  of  (137)  into  an  ellipse  inscribed  in  a  square  of  fixed  size. 

The  resulting  ellipse  may  still  be  elongated,  with  eccentricity  near 
unity,  but  nothing  is  done  about  this.  No  rotation  of  axes  is  carried 
out  at  any  time. 

We  assume  the  square  will  have  sides  of  M  units  (inches,  centimeters 
or  some  other  convenient  unit).  Thus  we  will  require  in  new  coordinates 
X  and  Y 


S(X)  =  &  M.  (145) 

Since  it  is  the  Dg^(=5.99)  elli^..  ytu  ^e  size  we  wish  to  control,  we 
hereafter  let  D  =  D^,..  The  scaling  transformations, from  n  , a  to 
X,  Y  are 


(146) 


a  ■ 


(147) 


If  these  formulas  are  substituted  into  (137),  the  result  is 


(X  -  X)2  +  —  (X  -  X)  (Y  -  Y)  +  (Y  -  Y)2  =  M  A 


(148) 


, -  •  -  '  4ac  1 

y  ac 

where  X  and  Y  correspond  to  H  , <7  as  given  by  (146)  and  (147).  If 
the  spans  S(X)  and  S(Y)  for  this  transformed  ellipse  are  computed  as 
was  done  for  (137),  it  is  found  that  (145)  holds. 

Some  plotters,  such  as  those  used  at  NWL,  have  a  basic  unit  for  the 


horizontal  axis  which  is  not  equal  to  the  basic  unit  for  the  vertical 


axis.  Thus,  we  now  consider  transformations  to  actual  plotter  coordi¬ 
nates  ( §  ,  7] )  such  that  (145)  is  maintained.  Let  c^  denote  the  number 
of  plotter  units  per  inch  on  the  horizontal  scale  and  c^  the  corres¬ 
ponding  number  on  the  vertical  scale..  On  the  NWL  plotter,  for  example, 
there  are  1024  units  in  about  11  inches  horizontally  and  1024  units 
in  about  9  inches  vertically,  so  that,  in  this  case,  c^  «  93.1,  c2  =  113.8. 

The  scaling  transformations  from  X,  Y  (in  inches)  to  £  ,  Tj  (in 
plotter  units)  are  then 

§  -  ?  »  c1(X  -  X)  (149) 


T?  -  7?  =  c2(Y  -  Y) 


(150) 


Here  we  denote  by  |  , 7)  the  coordinates  on  the  arbitrary  plotter 
scales,  of  the  point  which  is  selected  to  be  the  center  of  the  confi¬ 
dence  ellipse.  In  other  words,  £  ,  7]  are  not  computed  from  X,  Y,  but 
rather  they  are  conveniently  chosen  to  locate  the  ellipse  as  desired. 
Hence,  although  c^  and  c^  are  scale  factors  as  indicated  by  (149) 
and  (150)  it  would  not  be  correct,  in  general,  to  write 

|  =  cxX  and  7]  =  c2Y  ,  (151) 

since  X  and  Y  are  fixed  while  i-  and  7?  are  arbitrarily  chosen. 

Substituting  from  (149)  and  (150)  into  (148)  gives 


-T  U-02  + 
C1 


-$)(r}-rn  +  ~  (Ti  -T])  = 


M  2  A 

4  ac  ' 


(152) 


Letting  V  -7}  =  Tj' ,  we  have 


83 


(153) 


84 


and  maximum  (subscript  2)  abscissas  and  ordinates  are  given  by 


(157) 


a 

Y  A  * 


where  we  continue  to  assume  D  =  D^.  These  four  numbers  determine 
the  boundary  lines  of  the  square  in  which  the  95%  ellipse  is  inscribed. 
In  our  figures  the  square  is  not  explicitly  indicated,  but  the  four 
numbers  are  used  to  determine  the  scale  markings  in  the  figure. 


Numbers  d^,  k^,  d^,  k2  are  determined  such  that 


2V^r  =  d.  x  10  1 
A  1 


(158) 


2Vy  =  d2  x  10 


(159) 


where  k^,  k2  are  integers  and  d^,  d2  are  numbers  such  that 

0.1  ^  d1  <  1,  0.1  ^  d  <  1  . 

The  right  hand  side  of  (158),  (159)  are  simply  representation  of 

numbers  in  "normal  form"  for  FORTRAN  numbers.  Thus 

1226.4  =  0.12264  x  104  with  ^  =  .12264,  ^  =  4. 

Next,  we  determine  units  z^  (horizontal)  and  z2  (vertical)  for 

marking  the  axes  in  the  plotted  figure.  The  following  rule  is  used: 

k, 

z,  =  .01  x  10  if  0.1  ^  d.  <  0.2, 

i  i  ’ 

k. 

z±  =  .02  x  10  1  if  0.2  ^  dt  <  0.5,  (i  =  1,  2)  (160) 

k. 

z.  =  .05  x  10  1  if  0.5  £  d.  <  1  . 

1  \  1 
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To  illustrate,  suppose  =  642  =  .642  x  10^.  Then  =  .642, 

3 

k^  =  3,  therefore  z ^  =  .05  x  10  =  50.  Hence,  the  horizontal  scale 

will  be  marked  every  50  units  (in  /i  coordinates),  or  since  the  span 
is  642,  there  will  be  about  13  divisions  or  perhaps  14  or  15  including 
the  relatively  small  extensions  of  the  axes  beyond  the  ellipse  as 
discussed  below. 

We  next  consider  appropriate  values  to  assign  fx  and  a  on  the 
axes  at  the  left  and  right  and  lower  and  upper  limits  of  the  figure 
itself.  We  wish  these  values  to  be  integral  multiples  of  z^  (horizontal) 
and  z^  (vertical).  The  following  rule  is  used  in  which  [x]  indicates 
the  algebraically  greatest  integer  not  exceeding  x: 


L  Z1  J  Zl  ’ 

r  if  is  an  inte8er 

J  f[V2l  \ 

-  +  1  ] z,  otherwise  , 

U  2iJ  I  1 


^  °2 //Zo  is  an  inteSer 


(161) 
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The  subscript  L  refers  to  the  left  and  lower  boundaries  of  the  figure 
and  the  subscript  R  to  the  right  and  upper  boundaries.  The  quantities 


0,, 


a, , 


O',  z.  ,  z„  are  defined  in  (157)  and  (160). 
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After  A1,  //2,  //L,  /iR,  a  ,  a^%  <rL,  0R  are  determined,  we 

next  compute  the  corresponding  plotter  coordinates  77^.  From 

(146),  (147),  (149),  (150)  we  have 
/I  -  ?  *  rx(*  -  *'), 


cr  -  <r  =  r 2(  *7  -  r?')  ,  (162) 


where 


2  1/cD 


rl  “  Mc1  f  /I  ’ 


=  JL  l/5 

Mc„  ^  * 


(163) 


Finally,  from  (162) 


$-($*-£•  )  +  (0/r.) 

rl  1 


^7  =  (  ^'  -  —  )  +  ( /r2>  . 


(164) 


From  these  equations  we  compute  “0  ^  (i  =  0,  1,  2,  3)  giving  the 

left  and  right  and  upper  and  lower  boundaries  of  the  95%  ellipse  and 
of  the  entire  plotted  figure  in  plotter  coordinates. 

Markings  on  the  horizontal  axis,  "tick  marks",  are  to  be  inserted 
for  the  following  values  of  fl ;  fl fl  L+  z  /1q+  2z^,  ...,  fl^.  Tine 
S  coordinates  of  these  points  are  given  by  the  first  of  (164). 
Similarly,  we  use  the  second  equation  of  (164)  to  compute  the  Tj 
coordinates  for  the  "tick  marks"  on  the  vertical  axis  corresponding 
to  aL  +  z 2 ,  ^  2z2>  ...,  . 

Every  fifth  "tick  mark"  on  both  axes  is  identified  with  numerals, 
i.e.,  numerals  are  printed  at  multiples  of  5z^  (horizontal)  and  5z2 
(vertical).  For  example  if  z^  =  50,  fiQ  -  1250,  fl ^  =  2000,  then 
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numerals  are  printed  at  marks  indicating  1250,  1500,  1750,  2000  and 
"tick  marks"  are  printed  at  1250,  1300,  1350,  ...,  2000. 

In  general,  we  wish  the  numerals  to  be  well  chosen  for  easiest 
interpretation  of  the  figure.  In  order  to  insure  this,  we  determine 
numbers  fi ^  and  <7^,  where  is  the  fi  coordinate  of  the  first 
"tick  mark"  where  a  numeral  is  to  be  placed,  by  the  rule  given  below, 
and  similarly  O ^  is  the  <7 -coordinate  of  the  lowest  "tick  mark"  on 
the  <7 -axis  where  a  numeral  is  to  be  placed.  This  rule  is: 


"4 


~M0~ 

\ 

?zl. 

+  1 

is  an  integer 
otherwise  , 


The  notation  [x]  again  denotes  the  greatest  Integer  function. 

Hence, numerals  are  placed  at 

^4»  ^4  +  ,  .  • . ,  (^4+n^5z^),  H  ^ 

°U*  +  '’z2’  ***’  ^er4  +  n2^z2^*  °4  +  >  <7^  » 

where  n^  and  n^  are  integers. 
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APPENDIX  C 


PROBIT  METHOD 

The  probit  method  is  used  in  the  statistical  analysis  of  tests 
of  the  effectiveness  of  insecticides  and  other  poisons,  and  in  other 
problems  of  biological  assay.  Like  the  NWL  statistical  sensitivity 
program,  it  is  used  in  tests  where  responses  to  stimuli  are  quantal, 
that  is,  every  response  to  a  stimulus  can  be  characterized  as  a 
success  or  a  failure  according  to  some  arbitrary  criterion.  Like 
the  NWL  program,  it  determines  the  maximum  likelihood  estimates  of  the 
mean  and  standard  deviation  of  a  statistical  distribution  which  is 
assumed  to  be  normal.  The  probit  method  is  well  adapted  to,  and  was 
designed  for,  hand  calculations,  the  first  edition  of  [10],  Finney's 
book  on  the  method,  having  been  published  in  1947,  before  electronic 
computers  were  in  general  use.  It  involves  fitting  a  sequence  of 
increasingly  accurate  straight  lines  to  the  empirical  data,  the  calcu¬ 
lations  being  relatively  simple,  but  the  speed  of  convergence  being 
heavily  dependent  on  the  skillful  choice  of  a  line  representing  a 
first  approximation.  Finney  in  [10]  recommends  that  this  be  done  by 
eye,  and  states  that  a  statistician  experienced  in  the  method  will 
ordinarily  get  results  of  sufficient  accuracy  for  practical  purposes 
in  two  further  iterations.  See  further  discussion  below,  page  104  . 

If  great  accuracy  is  desired,  the  NWL  statistical  sensitivity  program, 
with  the  quadratic  convergence  of  the  Newton-Raphson  method,  would  be 
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superior.  But  the  numerical  work  in  the  NWL  program, with  exact 
calculation  of  all  second  derivatives  (as  compared  with  approximations 
in  the  probit  method)  would  be  very  laborious  by  hand.  We  mention 
that  a  useful  table  for  carrying  out  a  probit  analysis  by  hand  is 
given  in  [11]. 

In  a  typical  insecticide  test  of  the  type  discussed  by  Finney 
in  [10],  50  insects  might  be  given  a  dose  of  the  poison  of  which  the 
concentration  is  10.2  milligrams  per  liter,  40  insects  a  dose  of 
7.7  mg./l.,  etc..,  and  the  n umber  of  insects  killed  for  each  dose  or 
concentration  is  recorded.  Finney  actually  works  with  the  logarithm 
to  base  10  of  the  concentration  (or  of  100  times  the  concentration 
if  necessary  in  order  to  make  all  logarithms  positive),  which  he 
calls  the  dosage,  rather  than  with  the  concentration  itself,  which  he 
calls  the  dose.  He  assumes  that  the  critical  dosages  (rather  than 
doses)  of  the  individual  insects,  as  commented  on  in  more  detail  below, 
are  normally  distributed  about  a  mean  fx  with  standard  deviation,  o  . 
This  appears  to  be  purely  an  assumption.  Extensive  experience  with 
tests  of  this  type  indicates  that  the  critical  dosages  are  in  fact 
approximately  normally  distributed. 

Suppose  that  a  given  dosage,  say  x^,  following  Finney's  notation 
of  x  for  dosage,  kills  40  per  cent  of  the  insects  subjected  to  it, 
and  another  dosage,  x^,  kills  70  per  cent.  To  give  intuitive  content 
to  these  results,  we  hypothesize  the  existence  of  a  random  variable 
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called  the  critical  dosage  of  an  individual  insect,  defined  as  the 
dosage  just  sufficient  to  kill  him.  If  the  critical  dosage  for  a 
particular  insect  is  1.2,  he  will  be  killed  by  a  dosage  of  1.2  or  1.5, 
but  not  by  a  dosage  of  1.1.  We  assume  that  those  individual  critical 
dosages  are  normally  distributed  about  a  mean  critical  dosage,  fi  , 
with  standard  deviation  a  .  The  results  cited  above  for  dosages  x^ 
and  X£  are  interpreted  as  meaning  that,  for  a  randomly  selected 
individual  insect,  the  probability  that  his  critical  dosage  is  less 
than  x^,  or  less  than  is  0.4  or  0.7  respectively.  Hence  for  a 
large  random  sample,  about  40  per  cent  will  be  killed  by  a  dosage  x^, 
and  70  per  cent  by  a  dosage  x^.  In  the  language  of  normal  probability 
integrals,  we  can  write 


(165) 


dx 


0.7  . 


(166) 
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If  the  normal  deviates  of  the  form  (x-fi)lo  are  replaced  by  Y-5, 


the  quantity  Y  is  called  the  probit.  Thus, in  these  two  cases  we 
would  have 


(167) 


(168) 


and  from  a  probability  integral  table  we  easily  determine  the  approxi¬ 
mate  values 


Y  =  4.74665,  Y£  =  5.52440.  (169) 

The  term  -5  has  no  theoretical  significance,  but  has  the  effect  of 
making  the  probit  Y  always  positive  in  practical  cases,  since  virtually 
all  of  a  normal  distribution  is  within  is<7  of  the  mean.  Thus  the 
probit  Y  is  related  to  the  dosage  x  by  the  relation 

Y  -  5  =  •  (170) 

The  probit  Y  of  Eqs.  (167)  and  (168),  where  the  probability, 

0.4  or  0.7  or  some  other  value,  is  deduced  from  experimental  data 
(for  example,  the  killing  of  40%  of  the  insects  who  receive  a  given 


dosage  of  poison),  is  called  the  empirical  probit,  to  distinguish 
it  from  the  expected  probit,  also  denoted  by  the  letter  Y,  which  is 
introduced  later.  There  exists  still  another  probit,  the  working  probit 
Eq.  (204),  denoted  by  the  symbol  y.  The  empirical  probit  can  have 
the  value  i°o.  Suppose  that.  25  insects  receive  a  certain  dosage  and 
all  are  killed.  Then  the  equation  corresponding  to  (167)  is 

_ _  rY_5  2 

(l/j/2 IF  exp(-u  /2)du  =  1  , 

—  oo 

of  which  the  solution  is  Y  *  oo  .  Similarly,  if  no  insects  art*,  killed 
at  a  given  dosage,  the  empirical  probit  is  Y  =  -oo. 

Later  Finney  introduces  new  parameters  ct  and  ft  to  express  the 
relationship  between  the  probit  Y  and  the  dosage  or  stimulus  x  by 
the  formula 

Y  =  a  +  fix  .  (171) 

The  variable  Y  here  is  called  the  expected  probit  and  the  expression 
ct  +  ftx  is  similar  to  the  expressions  fta^-d  and  ftb^-a  in  the 
NWL  analysis  (see  Eqs.  (5)  through  (8)),  or  ftc^-CL  for  any  stimulus  c^ 
whether  a  success  or  a  failure.  To  clarify  this  similarity,  we  point 
out  that  the  symbols  ft  play  exactly  analogous  roles  in  the  two 
systems  of  notation,  but  Finney’s  01,  which  we  temporarily  denote  as 
differs  from  the  NWL  Ct ,  Ct^,  by  an  algebraic  sign  and  the 
additive  term  -5.  For,  equating  the  stimuli  x  and  c,  we  have 

Y-5-  V  fr-5  =  SgZ  .  BzJL  .  0C  .  aN> 
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from  Which  it  follows  that 


a 


F 


(172) 


Considering  now  Finney’s  notation  only,  from  (170)  and  (171)  we 
deduce 


(173) 


and 


H  =  (5  -a) /0 

a  -  i/£. 


(174) 


The  expected  probit  Y  as  in  Eq.  (171)  is  always  finite  for  a 
finite  dosage  x,  since  this  equation  merely  expresses  a  linear 
relationship  between  Y  and  x  which  is  a  line  of  best  fit  in  some 
sense  when  the  optimum  values  of  the  parameters  ct  and  0  have  been 
determined.  This  is  in  contrast  to  the  empirical  probit,  which  can 
have  a  value  of  i  oo  as  pointed  out  above. 

Now  for  the  next  few  steps,  following  Finney's  analysis  on 
pages  246-248  of  [10],  we  suppose  that  ws  have  a  general  probability 
distribution,  not  necessarily  normal,  of  critical  dosages.  We  will 
suppose  that  L,  the  logarithm  of  the  likelihood  function,  is  a  function 
of  two  parameters  0  and  <f>  ,  and  make  certain  comments  on  maximizing 
L,  and  derive  Finney's  method  for  approximating  the  second  derivatives 
of  I  with  respect  to  the  parameters  0  and  ^  .  Later  we  specialize 
to  the  normal  distribution  and  identify  $  and  <f>  with  a  and  0  as 
discussed  above  (see  Eqs.  (171),  (173),  (174)). 


Suppose  that  a  dosage  Ao  has  a  probability  P  of  killing  a 
randomly  selected  insect  and  let  Q  =  1  -  P  **  probability  of  failure, 
and  suppose  that  it  is  observed  in  a  test  that  r  out  of  n  insects 
receiving  the  dosage  kQ  are  killed.  Thus  r/n  is  what  can  be  called 
the  empirical  probability  p  of  success,  while  P  is  a  function  of  the 
parameters  $  and  ^  as  well  as  the  dosage  x  and  depends  on  the 
assumed  probability  distribution  (not  necessarily  normal).  The  object 
is  to  determine  the  values  of  6  and  <f>  which  maximize  the  probability 
of  obtaining  the  observed  or  empirical  probabilities,  r/n  at  dosage 
X  and  other  empirical  probabilities  at  other  dosages  in  the  experi¬ 
ment.  In  short,  we  wish  to  determine  the  maximum  likelihood  estimates, 
6  and  <f>  ,  for  the  parameters  0  and  <j>  . 

The  probability  that  r  out  of  n  insects  will  be  killed  by  dosage 


A  is 
o 


P(r)  =  (“)Pr<Tr, 


(175) 


(  )  being  a  binomial  coefficient  with  value  n!/[r! (n-r)!].  Suppose 

that  a  series  of  K  dosages  is  tested  in  an  experiment  with  empirical 


probabilities  of  the  form  r/n  for  each  dosage.  Then  the  logarithm,  L, 
of  the  probability  of  obtaining  all  of  the  observed  results,  dropping 
from  L  constant  terms  (not  depending  on  Q  and  <t>  )  of  the  form 
log  (”)  »  is 


L  -  L  r  log  P  +  L  (n-r)  log  Q  , 


(176) 


L  denoting  summation  over  all  dosages  (Finney  uses  the  notation  S 
for  L  ) . 
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We  remark  here  that  n  in  this  equation  is  only  the  number  of 


insects  subjected  to  one  given  dosage.  If  the  values  for  the  K 
different  dosages  are  denoted  by  n^,  n^,  . ..,  n^,  the  grand  total  N 


is  N  =  +  ...  +  n^,.  Similarly  we  would  have  r^,  r 2, 


*  rK5 


P^,  •••»  pjr»  Q^»  Q2»  Qk*  But  Eq*  (176)  we  follow 

Finney's  analysis  (with  L  substituted  for  his  S) . 

For  the  values  of  $  and  <f>  which  maximize  L  we  will  have 


9l  9l 

dO  m  d<t> 


0. 


(177) 


Now,  from  Eq.  (176), 


9L  T  r  9P  y  n-r  9q 
QO  "  * P  dO  Q  60  0 


(178) 


Putting  (since  Q  =  1-P),  r  =  pn,  and  performing  some 

Ou  Q  u 

simple  algebraic  steps,  we  easily  show  that 


9 1.  y  r  n(p-P)  dpi 

60  L  .  PQ  60  J  * 


and  similarly 


(179) 


Suppose  that,  at  an  intermediate  stage  in  the  calculation,  0  and  <p 
have  values  4> ,  which  make  the  derivatives  dl>/60  and  dL/9^ 

numerically  small  but  not  exactly  zero.  Hie  corrections  bQ  ,  b<f> 


are  given  approximately,  by  Taylor's  theorem,  by 


.§JL  + 

80, 
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00,  00. 
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(182) 


where  the  subscript  1  indicates  that  the  derivatives  are  to  be 
evaluated  at  6  -  0^,  0  =  0^. 

Finney  now  states  ([10],  page  248)  that  the  second  derivatives 
"may  be  simplified  by  putting  p  =  P  after  differentiation,  in  order  to 
give  expected  instead  of  empirical  values",  and  derives  approximations 
for  the  second  derivatives  in  terms  of  the  first  derivatives.  These 
appear  to  depend  for  their  validity  on  the  assumption  that  the  f 
particular  value  P^  is  near  the  empirical  probability  p  (=r/n),  *j/ereas, 
in  fact,  p  may  be  a  value  which  is  not  approximated  closely  when  the 
final  maximum  likelihood  estimates  0,0  are  obtained.  Yet  the 
method  evidently  has  worked  well  in  practice.  Indeed,  it  is  proved 
later  in  this  appendix,  pages  108-112  ,  that,  if  Finney's  method 
converges  at  all,  it  must  converge  to  the  true  maximum  likelihood 
estimates  0,0,  as  determined  by  the  NWL  program,  in  spite  of  the 
inexactness  of  the  approximations  which  Finney  makes  at  Intermediate 
stages.  A  convergence  proof  has  not  been  worked  out,  nor  does  Finney 
give  such  a  proof  in  [10].  But  a  little  computational  experience  with 
the  method  soon  convinces  one  that  the  method  does  converge,  at  any  rate 
for  sufficiently  good  initial  approximations. 


The  following  is  an  attempt  to  obtain  the  expressions  at  which 


Finney  arrives  for  the  second  derivatives.  We  suppose,  following 
Finney’s  notation,  that  (  0  0 ^)  is  a  point  in  the  00-plane, 

at  which  we  wish  to  approximate  the  second  derivatives,  which  will 
be  denoted  by  (02L/002)1,  (  0^ 70000)^  (02L/0  02)1.  We 
will  illustrate  with  (0^1/  0  0  2)^,  hut  similar  considerations  will 
apply  in  the  casas  of  the  other  second  derivatives. 

We  shall  simply  differentiate,  with  respect  to  0  ,  the  first 
derivative  as  given  by  Eq.  (179),  and  for  convenience,  in  the  next 
few  steps,  we  drop  the  subscript  1,  since  this  differentiation  will 
apply  to  any  point  (0,0)  including  (0^,  0^).  Primes  will  denote 
partial  derivatives  with  respect:  to  0  ;  thus  P"  represents  0^P/0  0 

Eq.  (179)  states 


,  r  Efe-g)  ,,t 

0  0  *  PQ  * 


(183) 


9l  r  (np-nP)P’ 

0  0  p  _  p?'  ’ 


(184) 


since  Q  =  1  -  P. 


Derivative  of  numerator  of  (184)  =  -n(P’)^  +  (np-nP)P"  .  (185) 
Derivative  of  denominator  of  (184)  =  P’  -  2PP1  =  P’(1-2P)  .  (186) 
Hence  , 


•Mr 


(P’)  +  (P-P) 


P1 


From  this  it  is  seen  that  we  get  a  simplified  expression  for 
dh/de*  although  not  necessarily  an  extremely  close  approximation, 
by  assuming  that  p  -  P  for  every  dosage  x.  Probably,  in  the  majority 
of  practical  cases,  the  positive  errors  in  the  summation  approximately 
balance  the  negative  errors.  Further  comments  on  this  are  given 
below.  Making  this  assumption  that  p  =  P  in  all  cases t  we  have 

(P*)2]  (188) 


or,  putting  in  the  subscript  1,  since,  following  Finney,  we  wish 
this  to  apply  at  a  point  designated  as  (  0-^,  ^)> 


(189) 


Similarly  we  have  the  approximations 


(190) 


82L  \ 

ded<t> ) 


(191) 


From  this  development  it  appears  that  Finney  simply  accepts 
the  errors  which  undoubtedly  occur  in  using  (189)  -  (191),  trusting 
that  in  the  long  run  the  positive  errors  will  approximately  balance 
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the  negative  errors.  Experience  indicates  that  the  method  does  converge 
satisfactorily  in  practice  in  realistic  cases,  especially  in  the  hands 
of  an  investigator  who  is  experienced  in  the  method  and  who  makes  a 
skilled  first  approximation.  If  it  converges  at  all,  it  must  converge 
to  the  true  maximum  likelihood  estimates  $  ,  <f>  ,  as  shown  in  Theorem  C-l 
in  this  appendix.  However,  it  is  possible  that  artificial  or  unrealistic 
cases  exist  in  which  the  probit  method  fails  to  converge,  since  no 
general  convergence  proof  has  been  given  so  far  as  the  present  authors 
are  aware. 

Using  the  approximations  represented  by  (189)  -  (191) ,  and  for 
convenience  dropping  the  !l— "  signs  although  they  are  understood  here, 
Eqs.  (181)  and  (182)  for  5  Q  and  6#  take  the  form 


^^r(H)2+  (If) 


=  L 


n(p-Px)  /  0p 


P1Q1 


00 


(192) 


■«*(»)(«)  *  *•* 


P1Q1 


0  P 
d<t> 


n(p-Px) 


P1<>1 


d  p  \ 

d<t>  )  • 


0-93) 
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specialized  to  the  present  normal  case.  Using  the  first  derivatives 
as  given  by  (197),  (198),  the  equations  become 


Z,  P  and  Q  being  determined  from  (194)  and  (196)  by  the  value  of 
Y  from  (199),  i.e.  with  Ct  =  c^,  =  d^. 

Defining  the  weighting  coefficient,  w,  by 


7. 2 

W  “  PQ  » 


Eqs.  (200)  and  (201)  are  the  equations  for  the  estimation  of  the 

weighted  linear  regression  of  the  variable  (p-P)/Z  on  x,  the  weight 

nw  being  assigned  to  each  value  of  (p-P)/Z. 

This  is  briefly  shown  as  follows.  Suppose  we  have  a  set  of  K 

points  {  (x^,  y^)  }  and  a  set  of  corresponding  v?eights  {v^}  (y^ 

will  be  identified  with  (p-P)/Z  for  the  various  stimuli  in  the 

present  situation,  and  v^  with  the  corresponding  weight  nw).  It  is 

desired  to  find  the  line  y  =  c  +  dx  such  that  the  weighted  sum 

2 

S  -  Dv^(c  +  dx^  -  y^)  is  minimized.  We  put 

2  'll  =  2  vi(c  +  dxi  "  y±)  =  0 


2  Id  "  ^vi<c+  dxi  "  VXi  =  °» 
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from  which  follow  the  normal  equations, 

clvi  +  dXv^  *  ^viyi 
„  clv1x1  +  dXv^2  -  £  vixiyi.  ^ 


(203) 


Letting  c,  d,  y . ,  v.  here  correspond  to  8c,  8d,  (p-P)/Z, 

nZ2 

nw  *  "^“respectively,  the  similarity  of  (203)  to  (200)  and  (201) 
is  clear. 

We  now  introduce  the  working  probit,  y,  [10,  p.  250],  defined  by 

y  =  Y  +  E_!-2  .  (204) 

The  working  probit  y  depends  on  both  the  expected  probit  Y  (which 
determines  P  and  Z)  and  the  per  cent  kill  p.  Convenient  tables  for 
determining  y  as  a  function  of  these  arguments  are  given  in  [10]. 

Replacing  (p  -  P)/Z  in  Eqs.  (200)  and  (201)  by  y  -  Y,  by  (204), 
transposing  the  terms  containing  Y  to  the  left,  and  replacing 
Z2/(PQ)  by  w,  by  (202),  Eqs.  (200)  and  (201)  give 

8cXnw  +  8d£nwx  +  £nwY  *  Lmry  (205) 

8c£nwx  +  8d£nwx2  +  XnwxY  =  Lnvxy  ,  (206) 

But  the  expected  probits,  Y,  are  determined  in  each  cycle  of  the 
calculation  from  the  equation  of  the  line  determined  in  the  previous 
cycle,  Y  =  c^  +  d^x,  Eq.  (199),  in  the  present  situation.  In  the 
first  cycle  of  calculations,  c^  and  d^  would  depend  on  the  line  fitted 
by  eye  (see  further  comments  on  this  following  Eq.  (210))  to  the 
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experimental  data,  and  the  improved  values  resulting  from  the  cycle 


would  be  c2^=  ci  +  ^c)  anc*  ^1  +  ^  ^  *  logically,  we  should 

have  cn  and  d^  at  the  start  of  our  cycle,  and  improved  values 
and  d^^,  but  we  will  stick  with  our  notation  c.  ,  d,  for  the  initial 
values  as  in  (199).  Thus  each  expected  probit  Y  is  determined  from 
the  corresponding  stimulus  x  by  Eq.  (199),  and  so  we  have 


£nwY  s  c^Xnw  +  d^£nwx  (207) 

£nwxY  =  c^Xnwx  +  d^£nwx^  .  (208) 

Putting  these  results  in  (205)  and  (206),  and  recalling  that 
c2  =  c^  +  6  c,  d2  =  +  6d,  we  have 

CjXnw  +  d^Dnwx  =  Xnwy  (209) 

2 

c^JCnwx  4-  d^^nwx  =  £nwxy  (210) 

for  the  direct  determination  of  the  improved  values  c^  and  d^. 

The  skillful  determination  of  a  first  approximation  Y  =  c^  +  d^x 
by  fitting  a  line  "by  eye"  to  the  empirical  data  is  a  matter  of 
experience  and  defies  exact  analysis.  Finney  states,  [10,  page  248], 
"care  and  experience  in  the  choice  of  first  approximations  will  usually 
ensure  that  two  cycles  give  a  numerical  accuracy  sufficient  for 
practical  purposes'.'  The  dosages  x  are  known  and  the  empirical  probits, 
Y,  which  are  determined  from  the  empirical  probabilities  p(=  r/n), 
putting  p  in  place  of  P  in  Eq.  (194),  are  plotted  against  x.  The 
subsequent  fitting  of  a  line  to  these  plotted  points  is  similar  to 
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"least  squares  by  eye",  which  is  well  known  In  obtaining  "quick  and 
dirty"  solutions  to  practical  problems.  But  the  difficulties  are 
compounded  here  by  the  fact  that  some  of  the  empirical  probits  may  be 
+  oo  .  If,  for  a  certain  dosage  x,  all  of  the  subjects  are  killed, 
or  r  =  n,  then  the  empirical  probability  p  is  100%  ox  1.0,  and 
consequently,  by  (194) ,  the  empirical  probit  Y  is  oo.  Similarly  if, 
for  a  given  x,  p  is  0  (i.e.  none  of  the  subjects  is  killed),  then  the 
empirical  probit  Y  is  -oo.  Presumably,  users  of  the  method  learn 
through  experience  how  much  to  raise  or  lower  the  line  Y  =  c^  +  d^x, 
corresponding  to  empirical  probits  of  oo  and  -oo, respectively,  so  as 
to  get  a  reasonably  good  first  approximation  and  thus  justify  the 
quoted  statement  from  [10]  that  two  subsequent  cycles  of  calculation 
will  usually  give  a  numerical  accuracy  sufficient  for  practical 
purposes. 

In  solving  this  linear  system  (209)  and  (210^  for  C£  and  d^, 
we  introduce,  with  Finney,  [10,  pp.  55  and  250],  symbols  x,  y,  S  , 

XX 

S  ,  defined  as  follows: 

VTf  * 


X  = 

£nwx 

£nw 

(211) 

y  = 

£nwv 

£nw  * 

(212) 

XX 

£nw(x  -  x)2  , 

(213) 

'xy  “ 

£nw(x  -  x)(y  -  y)  . 

(214) 
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We  can  then  show  by  algebraic  manipulations,  of  which  we  omit  the 
details,  that 


S 

xx 


r  2 
£ranx. 


( £nwx)  ^ 
£nw 


(2i5: 


s 

xy 


£  nwxy 


(£nwx)  (JCnwy) 
£  nw 


(216 


The  determinant,  of  the  linear  system  (209),  (210),  is 


A  =  JCnwJCnwx  -  ( £nwx)  = 


S  £tvw, 

XX 


(217; 


the  last  expression  following  by  (215).  Clearly,  A  >  0,  by  (213) 
and  the  fact  that  the  weights  nw  are  positive.  The  solution  of  the 
system  (209),  (210),  for  c^  and  d^  by  Cramer's  rule  from  linear 
algebra  is  then  given  by 


2 

4  •  c2  ~  -£nw7  *  £ nwx  "  JCnwx  •  £  nwxy 
A  •  =  £  nw  •  £  nwxy  -  £  nwx  •  £nwy , 

and  from  (216),  (217)  and  (219)  we  obtain 

A  •  d  =  S  £nw  , 

2  xy 

or 


S 

S 

xx 


£nv 

£xm 


S 

=  . 
S 


xx 


(218: 

(219 


(220 


We  can  then  show  that 


by  evaluating  A  *  (y  -  d^x)  by  means  of  several  of  the  equations 
starting  with  (211),  carrying  out  the  necessary  algebra,  of  which 
again  we  omit  the  details,  and  showing  thet  the  resulting  expression 
is  equivalent  to  A  *  C2  as  given  by  (218). 

The  improved  values  c2  and  d^  are  thus  given  by  Eqs.  (220),  (221), 
and  the  resulting  equation  at  the  end  of  the  computing  cycle  is 

Y  -  c2  +  d2x,  (222) 

as  an  improvement  on  Eq.  (199);  that  is,  (222)  in  general  leads  to 
a  larger  value  of  the  likelihood  function  than  (199). 

This  analysis  leads  to  an  efficient  algorithm  for  use  with  a 
desk  calculator,  and  such  calculations  have  been  carried  out  by  one 
of  the  present  authors.  The  experimentally  determined  values  of  the 
logarithms  of  the  stimuli,  x,  are  entered,  and  the  corresponding  values 
of  n,  r  and  p(=  r/n) .  The  values  of  Y  are  then  calculated  by  Eq.  (199), 
Y  *  c^  +  d^x, from  the  previous  cycle  or  from  the  line  fitted  by  eye. 

The  values  of  w  and  nw  can  then  be  computed  from  Eq.  (202),  from  the 
values  of  P,  Q  and  Z,  all  of  which  depend  on  Y,  but  [10]  gives  tables 
for  w  which  are  accurate  and  extensive  enough  for  most  practical  work. 
The  working  probits,  y  (see  Eq.  (204)),  are  then  found  from  tables 
in  [10].  We  next  compute  £nwx,  £nwy,  Xrtvx  ,  and  £nwxy ,  all  of 
which  are  efficiently  computed  on  a  desk  calculator.  The  values  of 
x,  y,  and  S  are  then  found  by  Eqs.  (211),  (212),  (215)  and  (216), 

d2  by  (220)  and  c2  by  (221),  and  we  then  have  the  improved  equation, 

(222). 


After  several  such  computing  cycles,  usually  not  more  than  four 
or  five  cycles  unless  the  [c^J.  and  {d^j.  are  computed  to  high 
precision,  say  to  more  than  five  significant  digits,  the  values  start 
repeating  themselves,  with  (for  some  n)  cq  =  c^^  =  c^^  =  •••  to 
the  number  of  digits  carried,  and  similarly  for  the  {d^}  ,  indicating 
that  convergence  has  occurred.  We  show  in  the  following  theorem  that 
in  such  a  case,  denoting  the  final  regression  equation  so  obtained  as 
Y  =  Ot  +  /?x  (see  Eq.  (171)),  then  the  corresponding  values  of  jtt  and 
a  t  computed  by  Eqs.  (174),  must  be  identical  with  the  maximum  likelihood 
values  H  ,  O  computed  by  the  NWL  program. 

THEOREM  C-l 

Assume  that  the  parameters  cn ,  d^  in  the  Finney  method  converge 
to  values,  a  ,  y3  ,  with  (1)  cn  Ot  ,  d^  ■>  y3  as  n  -»•  °o  ,  (2)  if 
c^  *  a  and  d^  =  yS  in  the  cd -plane,  the  increments  6c,  id  as  given 
by  Eqs.  (200)  and  (201)  both  vanish,  and  (3)  by  continuity,  6c  and  6d 
are  arbitrarily  close  to  zero  if  (c^,  d^)  is  arbitrarily  close  to 
(Ot , 8 )  in  the  cd-plane.  Assume  further  that  the  experiment  is  such 
that  (35),  (36)  in  the  NWL  theory  are  satisfied,  guaranteeing  the 
existence  of  a  unique  pair  fi  ,  O  of  maximum  likelihood  estimates. 

Then  if  fi  ,  o  are  computed  from  this  pair  ct  ,  yS  by  Eqs.  (174), 

|  these  values  n  ,  c  are  identical  with  the  values  , a  as  computed 

by  the  NWL  method. 

Proof.  The  proof  consists  simply  of  showing  that,  if  we  have 
values  Ot ,  y3  such  as  are  described  in  the  hypothesis,  then  for  the 
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corresponding  values  in  the  NWL  theory  (corresponding  to  the  same 


values  of  fi  and  O  ,  but  vith  Ot  N  in  general  different  numerically 
from  Oj,,  as  indicated  by  Eq.  (172))  we  will  have  La  *  Lp  »  0, 
which  can  be  true  only  for  the  unique  maximum  point  ( a  ,  f}  )  or 
(H  t  a)  in  cases  where  the  conditions  (35),  (36)  are  satisfied. 

Hence  the  point  (C of  the  hypothesis  must  correspond  to  the 
point  (0^,0)  or  (/I,  O)  of  the  NWL  theory.  Naturally  there  are 
many  details,  which  we  proceed  to  give. 

From  Eqs.  (200)  -  (201),  the  corrections  6c  and  6d  both  vanish, 
indicating  that  convergence  has  occurred  in  the  probit  method,  if  and 
only  if  the  right-hand  sides  of  these  equations  vanish.  This  linear 
system  has  a  nonsingular  matrix,  since  it  has  been  shown  following 
Eq.  (217)  that  A  >  0.  These  right-hand  sides  are  equivalent  to 
0L/0a  and  0L/0/J  in  the  probit  system,  as  is  easily  shown.  Hence 
convergence  occurs,  and  6c*  8d  =  0,  if  and  only  if  a  point  is 
reached  at  which  0L/0ct  =  0L/0/S  =0,  and  this  result  is  to  be 
expected  on  general  principles  as  well  as  on  the  specific  analysis 
given  here.  Hence  our  object  is  to  show  that,  if  a  point  is  reached 
where  0L/0a  =  0L/0jS  =  0  in  the  probit  system,  then  we  also  have 
0L/0a  *  0L/00  *  0  in  the  NWL  system,  so  that  we  are  at  the 
maximum  likelihood  parameter  point  (0»,^)  or  (H  ,<*)• 

In  comparing  the  probit  and  NWL  systems,  we  have  to  deal  with  the 
univariate  probability  integral  p(x),  q(x)  =  1  -  p(x)  and  z(x),  with 


Finney's  P,  Q,  Z  and  P^,  Q^,  corresponding  to  a  specific  dosage  x. 


with  his  p  ”  r/n  »  empirical  probability  for  a  given  dosage,  and  with 
die  expressions  p^  and  q^  in  the  NHL  system.  Hence  we  comment  here 
on  these  various  notations. 

The  probability  integrals  p(x),  and  q(x)  and  z(x),  are  given  by 
Eqs.  (10)  -  (12)  with  slightly  different  notation  from  that  which  is 
used  here,  and  also  the  functions  P(x)  and  Q(x)  are  defined  earlier, 
page  95  .  The  NWL  expressions  p ^  and  q^  are  defined  in  Eqs.  (3),  (4) 
in  terms  of  these  p  and  q  functions  of  certain  arguments. 

We  now  suppose  that,  in  a  sequence  of  iterations  of  the  probit 
method,  a  situation  has  been  reached  in  which  0L/90  =»  dL/ d<f>  *  0, 
or,  Eqs.  (179)  -  (180)  are  satisfied,  implying  that  a  maximum  of 
Finney's  likelihood  function  L  has  been  reached.  Supposing  further 
that  the  underlying  distribution  is  normal,  we  use  Eqs.  (197)  -  (198) 
and  obtain 


-£1  „  r  S-(P 
001  *  PQ 

ill;  „  y  Hi?  • 
dp  *  pq 


z  =  o 


xZ  =  0  • 


(223) 


(224) 


These  obviously  imply  that  the  right-hand  sides  vanish  in  Eqs.  (200)  - 
(201),  the  equations  which  are  actually  used  in  the  iterations,  so 
that  further  iterations  would  merely  repeat  the  values  already  obtained. 

Our  object  is  to  show  that,  in  this  situation,  we  must  also  have 
0L /da  *  0L /dp  *  0  in  the  NWL  method,  so  that  we  must  have  arrived 
at  the  unique  maximum  likelihood  point  (a  ,£)  or  {ft  ,  O)  in  the  NWL 
system,  using  the  {ft  t  o')  system  as  a  common  coordinate  system  for 
purposes  of  comparison. 
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By  Eqs.  (194)  and  (196), 

■V"  \  .  _  l\-JL 


K  ~  P 


(225) 


since  Y-5  is  equivalent  to  (x^  -  fi  )/o  ,  and  similarly 


\  =  1  *  Pk 


(226) 


Now  the  derivative  L a  in  the  NWL  system  is,  by  Eqs.  (10),  (11), 
(13),  (18)  and  (19),  given  by 


m 


=  X 


<&)  -  &r) 


*  «  p(^)  ' 


(227) 


and  by  Eqs.  (225),  (226),  this  is  equivalent  to 

*k  Z’- 


L«  =  m  0 - n 


(228) 


(referring  only  to  subjects  receiving  the  stimulus  x^) 


Here  we  have  the  NWL  derivative  and  the  i  and  j  summations  as 
in  Eq.  (5),  but  Finney's  P^,  and  corresponding  to  the  dosage  x^ 
At  a  given  x^,  insects  receive  the  stimulus  and  r^  of  them  are 
killed.  Hence  the  contribution  to  the  NWL  La  from  these  insects 
is 


(ok  -  »k)zk  W 


(229) 


and  taking  all  of  the  K  stimuli  into  account,  the  NWL  La  is 


(230) 


(231) 


k  “iA 

La  =  L  (Pk  -  pk>  ,  (232) 

k=l  x"k  K  K 

and  this  by  Eq.  (223)  is  -L^  in  the  probit  system. 

Hence  (NWL)  =  0  if  and  o,nly  if  (probit  method)  «  0,  and 
a  similar  analysis  shows  that  the  L^j  derivatives  vanish  together. 

Hence  Finney's  maximum  likelihood  point  ( fi ,  <T)  must  be  identical 
with  the  NWL  maximum  likelihood  point  ( fl  ,  (T ) ,  and  this  completes 
the  proof  of  Theorem  C-l. 

In  [4],  [10]  and  other  books  and  papers  dealing  with  experiments 
in  which  the  probit  method  is  used,  wa  often  find  expressions  such  as 
LD50,  LD99,  etc.,  signifying  the  lethal  dose  (or  dosage)  for  50%  of 
the  subjects,  for  99%,  etc.  Finney  in  [10]  also  uses  expressions 
such  as  ED50,  the  letters  meaning  "effective  dose",  in  cases  where 
effects  other  than  the  deaths  of  the  subjects  are  considered  successes. 
Once  the  maximum  likelihood  values  fl , 0  have  been  determined,  values 
such  as  LD50  and  LD99  are  very  simply  determined  from  a  table  of 
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•  <• 


probability  integrals.  Since 

p(0)  =  0.50,  p(2.3263)  -  0.99, 

where  p(x)  is  the  probability  integral. 


(233) 


p(x) 


-3-  f*  exp(-u: 


'/2)du, 


(234) 


then,  if  and  xQQ  represent  DD50  and  LD99  respectively,  we  have 


x50 


99 


«  0, 


x99  ’  ** 


=  2.3263  , 


(235) 


(compare  with  Eqs.  (165),  (166)),  from  which  we  find 


x50  “  P  »  x99  -  /*  +  2.3263  0  , 


(236) 


and  similarly  for  other  expressions  such  as  LD25,  LD90,  or  LD95. 

Reference  [4]  deals  with  an  experiment  in  which  poison  (cobra 
venom)  was  administered  to  dogs.  The  input  values  including  calculated 
values  of  x  or  log  (100  *  dose),  are  as  follows: 


Dose 

tng/kg 

X 

=  log(100  dose) 

n 

r 

0.06 

0.77815 

4 

0 

0.09 

0.95424 

5 

1 

0.10 

1.00000 

9 

3 

0.11 

1.04319 

6 

3 

0.12 

1.07918 

8 

7 

0.25 

1.39794 

6 

6 

0.50 

1.69897 

6 

6 

The  dose  is  in  milligrams  of  poison  per  kilogram  of  the  dog's  weight. 

Hie  maximum  likelihood  values  H  ,  O  ,  using  the  dosages,  x, 
were  calculated  at  NHL  by  two  distinct  methods,  (1)  by  desk  calculator 
using  the  probit  method  as  discussed  in  this  appendix,  and  (2)  on  an 
IBM  7030  (STRETCH)  computer,  using  the  NWL  method  as  described  in 
the  main  body  of  this  report.  The  results  of  the  STRETCH  calculation 
are  shown  on  page  115  of  this  report  on  which  it  will  be  observed 
that  values  of  the  dosage  x  to  5  decimal  digits  were  used  (with  one 
trivial  discrepancy  in  the  smallest  value  of  x,  in  that  published 
tables  of  logarithms  give  0.77815  as  the  value  of  log  G,  whereas 
the  value  printed  by  the  computer  was  0.77814).  In  the  NHL  hand 
calculations,  rounded  values  of  the  x's  were  used,  0.78,  0.95,  etc. 

The  1D99  values  were  also  computed  by  Eqs.  (236)  above.  Then  the 
results  were  expressed  in  terms  of  the  doses  (in  mg. /kg.  as  discussed 
above),  by  taking  antilogarithms.  The  values  of  O  so  obtained  are 
of  doubtful  significance.,  since  the  doses  are  not  normally  distributed 
if  the  dosages  are  so  distributed.  But  the  results  were  expressed 
in  terms  of  the  doses  for  purposes  of  comparison  with  the  results 
in  [4] . 

It  is  not  known  to  the  present  authors  how  the  authors  of  [4] 
performed  their  calculations.  But  it  is  assumed  that  they  used  the 
problt  method,  since  they  used  such  terminology  as  U)50  and  ID99, 
and  gave  a  figure  showing  the  line  Y  «  Ct  +  fix  and  curves  determining 
957.  confidence  limits,  which  are  similar  to  the  corresponding  curves 


in  [10].  It  is  stated  in  [4]  that  the  LD50  and  ID99  doses  are  0.105  ' 
and  0.148  mg. /kg.  respectively.  Ihe  remaining  results  on  the  line 

t 

"Edgewood,  ref.  [4]"  in  the  table  below  were  deduced  by  the  present 

. 

authors  by  means'  of  Eqs.  (236.)  and  by  taking  logarithms  to  express 
results  in  terms  of  dosages.  In  view  of  the  uncertainty  as  to  the 
methods  used  in  [4],  close  agreement  between  the  Edgewood  and  NWL 
results  was  not  necessarily  to  be  expected. 

However,  a  comparison  between  the  two  lines  of  NWL  results, 
for  dosages  rather  than  doses,  is  meaningful.  The  input  values  of  x 
were  slightly  different,  as  has  been  explained.  4dditional  slight 
differences  may  be  attributable  to  roundoff  error  and  similar  causes 
at  intermediate  stages,  the  NWL  and  probit  methods  being  quite  different 
But  theoretically  the  final  results  should  be  identical  (for  identical 
input),  by  Theorem  C-l  of  this  appendix. 

The  results  of  these  calculations  were  as  shown  in  the  following 


n  m  ID 50 


Edgewood,  ref . [4]  1.02119  .064051  1.17026  .105  .11589  .148 

NWL  Hand  Calc.  1.0230S  .064432  1.17214  .10525  .11599  .14864 


NWL  STRETCH  Calc.  1.02355  .064127  1.17273  .10557  .11591  .14884 


In  this  table,  the  first  three  columns  refer  to  the  output  In 
dosages,  and  the  last  three  to  the  output  in  doses,  milligrams  of 
poison  per  kilogram  of  the  dog's  weight. 

Also,  90%  confidence  limits  for  the  NWL  hand  calculations  were 
computed  by  Eq.  (4.6),  page  63  in  [10],  which  is  repeated  here  as 
(237).  For  a  given  Y,  for  which  the  corresponding  x  is  computed 
from  the  equation  Y  =  &  +  /3x,  the  limits  are 


where 

g  «  t2/^2^)  ,  (238) 

and  t  is  the  value  such  that  the  standard  t -variate  in  the  Student  t 
distribution,  with  K-2  degrees  of  freedom,  with  probability  0.90 
(in  the  case  of  90%  limits)  lies  in  the  interval  (-t,  t).  In  the 
present  case.  In  which  K  =  7  and  hence  there  are  5  degrees  of  freedom, 
the  value  of  t  is  2.02.  The  positive  and  negative  signs,  in  the  double 
sign  in  (237),  give  the  upper  and  lower  confidence  limits  respectively. 

These  curves  and  the  line  Y  »»  Ct  +  /?x,  are  shown  in  Figure  4. 

If  g  >  1,  expression  (237)  will  clearly  have  a  negative 
radicand  when  x  **  x,  and  for  this  reason  95%  confidence  limits  cannot 
be  computed  for  this  example  by  this  method.  For  probability  0.95, 
the  value  of  t  as  discussed  above  is  2.57,  and  for  this  example  we 
have  jit  «  15.52013,  *  .02732,  and  hence  g  =  1.004  >  1.  It  is 

not  known  to  the  present  authors  how  the  95%  confidence  limits  of 
[4]  were  computed. 
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APPENDIX  D 
SHALL  SAMPLE  THEORY 

For  determining  confidence  regions ,  the  method  recommended  by 
Golub  and  Grubbs  in  [14]  (see  Section  IV  of  the  present  report)  and 
used  in  the  existing  NWL  program,  is  that  based  on  the  asymptotic 
normality  of  the  distribution  of  the  maximum  likelihood  estimates 
and  O  for  large  samples.  Since  the  experimental  cases  received 
at  NWL  usually  contain  30  results  or  more  (firings  at  armor  plate), 
this  asymptotic  or  large  sample  method  is  reliable. 

Also,  this  asymptotic  method  is  relatively  simple  and  straight¬ 
forward  to  program  for  a  computer.  It  is  not  exact,  however,  except 
in  the  limit  as  the  sample  size  becomes  infinite. 

In  this  appendix,  a  method  is  outlined  by  which  confidence  regions 
could  be  set  up,  which  would  be  exact  for  finite  samples  of  any  size. 
Unfortunately,  it  appears  that  it  would  take  a  great  deal  of  computer 
time.  But  we  feel  that  it  is  worth  giving  this  method  for  completeness. 
If  computationally  feasible,  it  would  be  of  greatest  interest  and 
usefulness  for  small  samples  of  say  4  or  5  results,  for  which  the 
asymptotic  large  sample  method  would  give  results  substantially  in 
error.  For  this  reason,  such  a  method  or  theory  is  often  referred  to 
as  a  small  sample  theory.  It  could  equally  well  be  called  an  exact 
theory,  giving  exact  results  for  samples  of  any  size,  which  the 
asymptotic  theory  fails  to  do. 
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The  confidence  regions  generated  by  the  exact  method  discussed 
here  would  be  connected  sets  in  the  plane,  but  not  ellipses  in  general. 
These  point  sets,  each  containing  infinitely  many  points,  could  be 
approximated  by  working  with  finite  grids  of  small  mesh  size. 

We  begin  with  some  definitions  and  generalities  about  confidence 
regions  for  experiments  of  the  type  here  considered,  and  many  of 
these  remarks  apply  to  the  asymptotic  large  sample  theory,  or  to  any 
method  of  setting  up  confidence  regions  for  these  experiments,  as  well 
as  to  the  exact  theory  to  be  presented  in  detail  in  this  appendix. 

We  are  given  a  set  of  stimulus  levels  (projectile  speeds  in  the 
case  of  the  armor  plate  experiment),  c^,  c 2,  ...,  cN  in  the  notation 
of  this  report.  Section  IV,  where  N  =  n  +  m.  We  could  have,  for 
instance,  cl  =  1153  ft. /sec.,  c 2  =  1161,  etc.  These  c±  are  constants 
which  serve  to  define  i  experiment.  They  are  not  random  variables. 
In  setting  up  confidence  regions  we  conceive  of  a  large  number  of 
replications  of  the  experiment,  with  exactly  the  same  set  of  {c^} 
each  time,  but  with  different  outcomes,  and  analyze  the  distribution 
of  these  outcomes  by  probabilistic  methods.  Ihe  fact  that  in  many 
experimental  situations  it  is  not  possible  to  control  the  stimuli 
exactly,  in  particular  in  the  case  of  projectile  speeds,  is  immaterial 
from  the  theoretical  point  of  view.  Once  we  are  given  the  constants 
of  the  experiment,  the  set  of  {c^  ,  we  can  compute  the  functions 
f  (  H*C)  of  Eq.  (65),  each  f^  depending  on  the  constant  c^,  the 
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random  variable  6^,  and  the  variables  jl  and  <7.  If  we  changed  any 

of  the  ve  would  have  a  different  experiment,  with  a  different 

set  of  confidence  regions  corresponding  to  the  various  possible  outcomes. 

The  random  variables  are  the  of  Section  IV,  6^  being  given 

the  value  1  if  the  k-th  shot,  of  speed  c^,  produced  penetration,  and 

a  value  of  0  if  there  was  no  penetration.  Since  there  are  N  of  these 

N 

random  variables,  each  having  two  possible  values,  there  are  2  possible 

N 

outcomes  of  the  experiment,  or  2  values  in  the  joint  distribution  of 
the  random  variables.  When  we  conduct  an  actual  firing  experiment, 
determining  whether  or  not  penetration  occurs  at  1153  ft. /sec.,  at 
1161  ft. /sec.,  etc.,  in  the  armor  plate  experiment,  we  are  taking  a 
sample  from  the  joint  distribution  of  the  set  {$k}.  In  practice  we 
could  not  without  many  trials  repeat  the  experiment  even  once,  because 
of  the  impossibility  of  precisely  controlling  the  stimuli,  but  we  can 
conceive  of  a  large  number  of  replications,  always  with  the  same 
constants  {c^}. 

Those  members  of  the  set  {c^}  for  which  penetration  occurs  form 

the  set  {a^}  in  the  notation  of  this  report,  and  the  "failures"  form 

N 

the  set  {bj  }.  In  some  of  the  2  possible  outcomes,  for  a  given  set 
of  constants  {c^,  Eqs.  (35)  and  (36),  the  necessary  and  sufficient 
conditions  for  the  existence  of  a  unique  maximum  of  the  likelihood 
function,  with  O  =  1//3  >  0,  will  be  satisfied.  These  will  be  called 
valid  outcomes.  The  remaining  outcomes,  in  which  Eqs.  (35)  -  (36) 
are  not  satisfied,  will  be  called  degenerate  outcomes.  Thus  the 

N 

number  of  valid  outcomes,  plus  the  number  of  degenerate  outcomes,  is  2  . 
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It  is  a  trivial  matter  to  construct  examples  of  both  types  of  outcome 
say  for  N  «  3,  the  minimum  value  of  N  for  a  possible  valid  outcome. 

In  any  experiment,  where  the  set  is  specified,  with  N  ^  3, 

degenerate  outcomes  will  necessarily  exist,  for  instance  by  taking 
every  "success",  a^  greater  than  every  b^,  so  that  (35)  is  not  satis¬ 
fied.  Also,  valid  outcomes  must  exist,  in  every  experiment  of  interest 
at  any  rate.  Hence  if  we  denote  the  numbers  of  vali-  outcomes  and 
degenerate  outcomes  by  Nv  and  respectively,  we  have 


N  +  N 
v 


d 


(239) 


0  <  Nv<2N,J 

where  N(=*  n  +  m),  as  before,  is  the  number  of  members  of  the  set  {c^}, 

or,  in  the  armor  plate  experiment,  the  number  of  projectiles  fired. 

For  any  one  of  the  Nv  valid  outcomes  of  the  experiment,  say  the 

t-th  one,  there  exists  by  Theorem  4  a  unique  point  At(/i fc, o  fc)  in  the 

H<S  -plane,  representing  the  maximum  likelihood  estimates  of  the 

parameters  fl  and  O  determined  by  the  t-th  outcome.  The  calculation 

of  these  values  fl  t  and  O  t  to  a  preassigned  accuracy  is  of  course 

the  principal  object  of  the  program  described  in  this  report.  The 

HO  -plane  and  three  of  these  points  are  shown  in  Figure  5.  In  actual 

N 

experiments  Nv  may  be  very  large,  but  is  always  less  than  2  and 
therefore  finite.  If  N  »  30,  a  realistic  value  in  armor  plate  firings, 
2N  >  109. 


For  any  method  which  may  be  used  for  determining  confidence 
regions,  there  is  associated  with  each  point  Afc(#/t,  <7fc)  a  corresponding 


r 
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confidence  region  Rfc.  For  the  asymptotic  large  sample  method  which 
is  used  in  the  program  (Section  IV),  each  of  these  regions  is  an  ellipse 
with  At  at  its  center,  and  for  convenience  the  confidence  regions 
are  shown  as  ellipses  in  Figure  5.  But  for  the  exact  or  small  sample 
method  to  be  described  in  this  appendix,  the  regions  are  in  general 
not  ellipses.  However,  with  each  point  there  is  associated 

a  corresponding  confidence  region  Rfc. 


Maximum  Likelihood  Points  At(/M  ,  <7  )  and 
Associated  Confidence  Regions. 
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So  far  we  have  merely  stated  that  these  confidence  regions  exist, 
and  are  associated  one-to-one  with  the  maximum  likelihood  points 
At(fl  t,  <7t)  in  the  flO  -plane.  We  are  now  ready,  however,  to  deal 
with  such  questions  as  the  following.  What  is  a  confidence  regie:.? 

What  conditions  must  be  satisfied  by  the  system  of  confidence  regions 
generated  by  a  given  experiment,  i.e.  a  given  set  of  {c^},  in  order  that 
the  system  qualifies  as  a  set  of,  say,  90%  confidence  regions  for  the 
experiment? 

First,  the  true  but  generally  unknown  parameters,  /iq  and  0 , 

are  constants  which  we  are  trying  to  estimate  from  our  experiment, 

and  it  is  meaningless  to  speak  of  probabilities  that  the  true  parameter 

point  k(fi.  a  )  lies  in  a  given  region.  Hie  true  values  u,  and  a 
o  o  o  o  o 

could  be  determined  to  any  desired  accuracy  and  confidence  level  by 

a  sufficiently  large  number  of  experiments.  Being  constants,  they 

are  not  subject  to  any  statistical  distribution.  Alternatively,  we 

may  if  we  wish  think  of  the  true  parameters  u.  and  a  as  controlled 

~  o  o 

by  an  imaginary  opponent  who  sets  them  at  any  values  desired  by  him, 
but  does  not  inform  us  of  the  values.  Again,  they  are  not  subject  to 
any  statistical  distribution.  Once  this  opponent  has  set  the  values, 
ou  •'sk  ie  to  estimate  them  by  experimental  methods. 

.ce  the  experiment  has  been  defined,  by  specifying  the  constants 
{c^,  a  set  of  Nv  maximum  likelihood  points  {(At(/ifc,  <*t)}  and 
associated  confidence  regions  (by  whatever  confidence  region 

method  is  i:i  use  at  the  tJme)  are  determined.  There  may  be  many 
millions  of  them  in  realistic  experiments. 


But  we  can  conceive  of  the  computation  of  the  entire  finite  system, 


since  the  entire  system  is  defined  once  the  {c^J  are  given,  and  it  is 
defined  before  we  take  any  sample  of  the  joint  distribution  of  the 
random  variables  as  discussed  above,  by  an  actual  firing  of  N 

projectiles,  in  the  case  of  the  armor  plate  problem. 

For  any  specified  outcome  of  the  experiment  defined  by  the  given 
set  {CjJ,  either  a  valid  or  a  degenerate  outcome  in  the  terminology 
used  above,  and  for  any  assumed  values  (trial  values,  etc.)  fl,0  of  the 
parameters,  whether  equal  to  the  true  values  fl Q,  oq  or  not,  there  is 
a  probability  of  occurrence  of  the  given  outcome  expressed  in  terms  of 
the  variables  fi  and  O  .  We  can  begin  with  Eq.  (65)  giving  the 
probability  density  function  for  each 


fk  -  f  (  «k;  a) 


(240) 


and  write  the  joint  density  function  for  the  entire  set  {fi^}  in.  the 
form 

N 

F(  6,,  ...,J (1 1  0 ")  =  11  ^  (  i  M  ® )  i  (241) 


(242) 
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Nov  suppose  a  definite  outcome  is  specified  by  assigning  particular 


values  {Ij*}  (each  0  or  1)  to  the  set  {jj},  e.g.  6^  -  0,  b2  ■  0, 

6 2  ■  1,  . ..  .  The  probability  of  occurrence  of  the  specif ied  outcome 
is  then 


*  F  (dx*,  ...,  dN*;  /i  ,  4F  > 


N 

77 

k-l 


['&)] 


(243) 


If  we  replace  each  6 by  its  value,  0  or  1,  and  replace  each  c^ 
which  is  a  "success"  by  an  a^  and  each  c^  'trtiich  is  a  "failure"  by  a 
bj  f  we  get 


G(HtO) 


n  /  at-  n  \ 

m 

n  p  (  a  ) 

n  q 

i-1  '  °  ' 

j-1 

(244) 


which  is  consistent  with  the  expression  in  Eq.  (2).  Eq.  (243)  or  (244) 

then,  gives  the  probability,  in  terms  of  the  assumed  values  of  the 

variables  fi  and  0 ,  that  the  specified  outcome,  valid  or  degenerate, 

will  occur.  The  sum  of  all  these  G  functions,  for  a  given  pair  of 

N 

values  of  ft  and  a ,  over  all  of  the  2  possible  outcomes  of  the 
experiment,  must  be  1,  since  for  any  sample  from  the  distribution  of 
the  se.t  {6^}»  i.e.  for  any  firing  of  N  projectiles  at  the  specified 
speeds  {c^}  In  the  armor  plate  test,  one  and  only  one  of  the  2N 
possible  outcomes  must  occur. 


We  can  now  explain  the  significance  of  the  confidence  regions 
{Rfc}  of  Fig.  5,  one  Rfc  for  each  valid  outcome,  and  state  the 
condition  which  they  must  satisfy  in  order  to  be  true  confidence 
regions  at  a  specified  confidence  level.  The  discussion  is  still 
perfectly  general  in  that  it  applies  to  any  method  of  setting  up  a 
system  of  these  regions.  For  definiteness,  we  will  specialize  to 
907.  confidence  regions,  but  the  numerical  value  of  the  confidence 
level  is  immaterial . 

The  condition  referred  to  is  as  follows.  The  system  of  907. 
confidence  regions,  {  Rt }  ,  must  be  such  that, whatever  values  the 
true  but  unknown  parameters  Hq  and  Oq  may  have,  the  conditional 
probability  is  at  least  0.90  that,  given  that  the  outcome  of  the 
experiment  is  a  valid  one,  the  true  parameter  point  Aq (  H q>  <jq) 
is  covered  by  at  least  one  region  of  the  system  of  confidence  regions. 

We  give  a  simple  numerical  illustration  in  terms  of  the  situation 
represented  in  Fig.  5r  where  there  are  only  three  valid  outcomes, 
and  then  we  phrase  the  condition  more  formally  in  terms  of  the 
G(  H  ,,  0  )  notation. 

In  Fig.  5  we  consider  the  situation  with  respect  to  the  point 
wb^.ch  is  to  be  regarded  as  a  possible  position  of  the 
true  parameter  point  hQ(  q ,  <tq),  since  the  position  of  Ae  is  unknown. 
Suppose  we  have  the  following  probabilities:  <f%)  =  *08,.  1 

*  .56,  m  .16.  This  means  that,  if  the 

population  parameters  are  fl  1  and  a  ' ,  then  the  probability  is  .08 
that  the  result  is  valid  outcome.  No.  1  and  similarly  for  subscripts 


2  and  3.  Hence  the  probability  that  the  outcome  is  a  valid  one  is  the 
sum  of  these,  or  .80,  and  the  probability  of  a  degenerate  outcome  is 
1  -  .80  or  .20.  Therefore,  given  that  the  outcome  is  a  valid  one,  the 
conditional  probability  that  it  is  outcome  No.  1  is  .08/. 80  or  .10, 
and  similarly  for  outcomes  Nos.  2  and  3  the  conditional  probabilities 
are  .56/. 80  or  .70,  and  .16/. 80,  or  .20,  respectively,  these  conditional 
probabilities  summing  to  1  since  we  suppose  that  there  are  only  these 
three  valid  outcomes.  But  the  point  k'(/l  »,  a  ')  is  contained  in  both 
R2  and  Rg,  and  the  conditional  probability  that  the  outcome  is  No.  2 
or  No.  3  is  .70  +  .20,  or  .90.  Hence,  if  the  true  parameter  point 
Aq  is  at  A',  the  conditional  probability,  given  that  the  outcome  is  a 
valid  one,  that  the  point  Aq  is  covered  by  at  least  one  of  the  regions 
{Rtl  »  18  ,90#  111X18  the  required  condition  is  satisfied  for  the  point 

A'  by  the  system  {rJ.  Obviously,  in  order  for  the  condition  to  be 
satisfied  for  all  admissible  positions  of  the  point  A q(  fl  ,  ) ,  all 

such  positions  would  have  to  be  covered  at  least  once  by  the  system 
{R£}  ^  this  is  not  the  case  in  the  simplified  Fig.  5. 

This  example  can  be  looked  at  in  another  way  in  terns  of  a  large 
number  of  replications  of  the  experiment  defined  by  the  constants  {cjJ. 
Suppose  the  experiment  is  performed  1,000,000  times  and  that  the 
true  parameter  point  A <X0)»  as  before,  is  the  same  as  A’ (.fl*,  a ’). 
Then,  because  of  the  assumed  probabilities  G^^',  <y‘)  -  .08,  etc., 
outcome  1  will  occur  about  80,000  times,  outcome  2,  560,000  times, 
outcome  3,  160,000  times,  and  one  or  another  of  the  possible  degenerate 
outcomes  will  occur  the  remaining  200,000  times.  Therefore,  out  of 
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800,000  replications  in  which  a  valid  outcome  occurs,  outcome  2  or  3 
will  occur  720,000  times,  or  in  90%  of  the  800,000  cases.  But 

ff'),  which  we  assume  here  to  be  the  true  parameter  point, 
is  oovered  by  both  of  the  confidence  regions  R,,  and  R^,  and  thus  is 
covered  in  720,000  out  of  every  800,000  replications  which  result  in 
a  valid  outcome,  or  90%,  If  every  admissible  position  of  the  point 
A 0(t*0»  Oq)  satisfies  a  similar  condition,  the  system  of  confidence 
regions  is  shown  to  be  a  legitimate  system  at  the  90%  confidence 
level *  ■ 

The  following  analysis  is  a  simple  generalization  of  the  foregoing 

example.  We, now  assume  that  the  set  has  N  members,  so  that  there 

are  2  possible  outcomes,  valid  and  degenerate.  Let  A be  any 

point  in  the  flO  -plane  which  Is  a  possible  position  of  the  true 

parameter  point  A  (  fl  ,  a  )»  and  let  G,  (  jU  ,  a )  (see  Eq.  (244))  be 
O  O  O  K 

the  probability  that  the  k-th  outcome  will  occur,  if  the  true  parameters 
are  fJ.  and  O  .  Let 


V(M,0)=  £ 
v  k 


(^(/i  ,  a)  j  k-th  outcome  is  valid 


(245) 


p  AM  ,  a  )  *  1  -  P  (  H,0)  =  £  I  k-th  outcome  is  \ 

a  v  k  l  K  degenerate  J 


(246) 


Pc(Ji,  a)  S  £  |c^O«,a)|  point  A  (.Mt  o)  is  in  \  }  ,  (247) 


(v  for  "valid”,  d  for  "degenerate",  c  for  "covers") 


$ 
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Their  the  following  condition  must  be  satisfied. 

>  0.90.  (248) 

V  V 

If  a  similar  condition  is  satisfied  by  every  admissible  position 
of  the  true  parameter  point  Ao(/*q,  <TQ),  then  the  system  {Rfc}  is  a 
legitimate  907.  system  of  confidence  regions.  It  is  assumed  here 
that,  given  the  set  {c^,}*  we  have  not  only  a  set  { of 
maximum  likelihood  points,  one  for  each  valid  outcome,  computable 
by  the  principal  program  of  this  report,  but  also  a  system  {Rfc}  of 
90%  confidence  regions.  Since  the  discussion  still  applies  to  any 
method  of  determining  confidence  regions,  we  merely  assume  that  the 
system  {^t}  exists,  and  have  not  commented,  in  the  preceding  discussion, 
on  the  method  of  determining  the  system. 

We  now  turn  to  the  exact  or  so-called  small  sample  method,  the 
description  of  which  is  the  principal  object  of  this  appendix,  for 
determining  the  system  {Rfc}  of  confidence  regions. 

'  First  we  comment  on  the  phrase  "admissible  positions"  which  has 
been  used  with  regard  to  the  true  parameter  point  A 0(H0*  <7Q)  •  We 
will  have  only  a  finite  set  {Rfc}  of  confidence  regions,  each  of  finite 
area,  and  hence  the  set  of  possible  parameter  points  AQ  cannot  possibly 
be  covered  by  the  union  of  the  sets  {Rt}  if  it  is  of  infinite  area. 

In  an  actual  experiment  it  will  ordinarily  not  be  difficult  to  set 
limits  a  ,  and  u  such  that  the  true  mean  u  must  be  in  the 
interval  (H  ,  ,  ft  ).  In  the  armor  plate  experiment,  for  instance. 


the  projectile  speeds  are  always  positive  and  we  may  know  from  the 

characteristics  of  the  gun,  ammunition,  etc.,  that  the  speed  is  always 

less  than  2500  ft. /sec.  In  such  a  case  we  could  say  that  hq  must 

be  in  the  interval  (0,  2500).  From  a  knowledge  of  the  bounds  on  hq 

and  on  the  magnitudes  of  the  individual  stimuli,  we  could  then  compute 

bounds  a  .  and  O  on  the  true  standard  deviation  a  ,  o  . 

min  may  o  min 

normally  being  0  and  O  mav  positive.  Thus  in  an  actual  experiment  we 
can  ordinarily  limit  the  admissible  positions  of  the  true  parameter 
point  to  those  in  the  interior  of  a  rectangle  in  the  HO  -plane, 
determined  by  the  inequalities  H  <  HQ  <  H  mav> 

O  j  <  a  <  a  ,  and  this  rectangle  can  be  covered  by  a  finite 
min  o  max 

number  of  confidence  regions,  each  of  finite  area. 

We  note  in  passing  that  the  same  problem  of  covering  a  possibly 
infinite  area  in  the  HO  -plane  exists  even  in  the  asymptotic  large 
sample  method,  since  in  practice  we  always  have  a  finite  number  of 
stimuli  and  therefore  a  finite  number  of  possible  valid  outcomes, 
each  associated  with  its  own  confidence  ellipse  of  finite  area.  We 
do  not  dwell  on  this  point  here,  the  large  sample  method  being  discussed 
in  Section  IV,  although  without  explicit  mention  of  this  point. 

In  describing  the  determination  of  the  confidence  regions  {Rt}, 
cue  confidence  region  Rfc  (as  well  as  one  maximum  likelihood  point 
AfcC/lkt^k^  ^or  t*ie  outcomes  of  the  experiment,  we 

will  consider  a  numerical  example  which  will  bring  out  the  essential 
simplicity  of  the  procedure,  while  if  on  the  contrary  we  set  up  a 
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perfectly  general  notation  with  multiple  subscripts,  etc*,  we  might 

create  a  false  Impression  that  the  process  Is  obscure  and  complicated. 

N 

We  recall  that  the  separation  of  the  2  possible  outcomes  of  the 

experiment  into  N^,  valid  outcomes  and  degenerate  outcomes,  where 
N 

Nv  +  Nj  =  2  ,  is  independent  of  the  assumed  values  fl  and  O  ,  but 

depends  only  on  whether  or  not  the  { a^}  and  {b^}  ,  that  is,  the 

successes  and  failures  among  the  stimuli  {c^. }  ,  satisfy  Eqs.  (35)  » 

(36).  Consequently,  whatever  admissible  parameter  point  A (fl,o) 

we  consider,  we  will  have  exactly  valid  outcomes  to  take  into  account. 

Moreover,  G .  (  /l ,  <y),  the  probability  that  the  k-th  valid  outcome 

(in  some  enumeration  of  these  outcomes)  will  result  when  the  parameters 

are  (i  and  <T  (arbitrary  admissible  values)  will  be  a  positive  number 

on  the  open  interval  (0,  1),  though  of  course  very  close  to  0  in 

some  cases,  those  in  which  a  large  number  of  factors  in  Eq.  (244) 

are  small  probabilities.  The  sum  £  <7)»  stunned  over  all 

k 

valid  outcomes,  will  be  p ^(|f,0)*  by  definition  of  the  latter,  Eq.  (245), 
and  therefore  £  {  [l/pv(/i  ,  a  )]Qk(  H  )}  »  summed  over  all  valid 
outcomes,  will  be  1.  Note,  however,  that  all  probabilities  depend  on 
fl  and  <S  .  If  A1(/i1,  O^)  and  ^^2*  are  distinct  parameter 
points,  Pv(^1»  +  Pv(^2’  °2)  in  8eneral»  and  ° ])  + 

G^(  H  2*  In  general  for  the  same  k. 

Coming  now  to  the  numerical  example  referred  to  above,  let  us 
suppose  that,  for  some  admissible  position  A 0 ^)  of  the  parameter 


point,  an  arbitrary  and  representative  admissible  position, 

[l/py(iU1,  takes  its  largest  value,  .39,  for  k  =  46, 

its  next  largest  value,  .26,  for  k  =  22,  and  so  on  as  listed  in  the 
table  below.  Thus,  given  that  a  valid  outcome  occurs,  the  conditional 
probabilities  that  outcomes  numbers  46,  22,  ...  occur,  for  the  assumed 


1*  al  of 

the  parameters 

,  are  . 

,39, 

ti/pv(/v 

*1>] 

G46 

*1>  = 

.39 

V3 

G22 

V  = 

.26 

n/Pv^!* 

•i>] 

G77 

V  = 

.17 

[l/pv(0!. 

*i)] 

G39 

°t)  = 

.07 

[l/pv(^x. 

G26 

u*v 

<V  « 

.02 

[!/?„<*!. 

ai)] 

G51 

v- 

.02 

[l/pv(*l. 

*i>3 

G41 

V 

al>  - 

.01 

[1/pv(^l» 

<T1)] 

G66 

V 

V  = 

.01 

Cumulative  sums 

.39 


.65 

.82 

.89 

.91 

.93 


.94 

.95 


We  list  the  conditional  probabilities  for  all  valid  outcomes  in  non¬ 
increasing  ord^.r  of  the  probabilities  as  shown  in  the  table.  In  case 
of  ties,  for  example  .02  for  k  =  26  and  51,  we  use  the  increasing 
order  of  subscripts.  We  compute  and  list  the  cumulative  sums  as 
shown  and  at  the  end  of  the  table  this  sum  would  attain  the  value  1 
as  has  been  pointed  out.  In  defining  a  set  which  will  be  denoted  as 

S.  ,  however,  we  are  interested  in  the  table  only  down  to  the  point 
A1 
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where  the  cumulative  sum,  including  "tins  for  last  place",  first 
reaches  or  exceeds  the  value  .90  (in  the  case  of  90%  confidence 
regions).  In  the  case  under  consideration,  we  do  not  exclude  k  *  51 
on  the  ground  that  the  cumulative  sum  as  listed  attains  the  value 
.91  with  k  =  26,  as.  there  is  no  theoretical  reason  for  preferring 
k  =  26  to  k  =  51  or  vice  versa.  The  value  .90,  or  any  other  value 
on  the  interval  (0,  1)  will  necessarily  be  reached  or  surpassed, 
since  the  cumulative  sum  eventually  reaches  the  value  1. 

We  now  define  the  set  as  the  set  consisting  of  outcomes 

numbers  22,  26,  39,  46,  51,  and  77,  the  order  being  immaterial  once 

the  makeup  of  the  set  is  established.  This  is  a  set  of  outcomes,  not 

of  points  or  of  probabilities.  There  exists  a  similar  set  S.  for 

A 

every  admissible  position  of  the  parameter  point  A(  fl ,  a),  and  the 
method  of  determining  is  simple  and  clear  in  principle  from  the 
example  given,  although  naturally  it  would  be  an  enormous  computing 
job  in  practice. 

•  Also,  we  note  that  the  number  of  admissible  positions  of  A( M ,  o), 
ranging  over  the  interior  of  a  rectangle  in  the  HO  -plane,  is 
uncountably  infinite.  Hence  we  could  not,  even  in  principle,  determine 
for  every  admissible  position  of  A (.  H ,  O  ) .  If  we  wished  to  program 
the  method  for  a  computer,  we  would  have  to  approximate  the  interior 
of  the  rectangle  by  a  finite  grid  of  small  mesh  size,  or  something 
similar. 


We  can  now  define  the  point  sets  in  the  H<f  -plane  constituting 
the  90%  confidence  regions  {Rj,  one  for  each  of  the  valid  outcomes. 

They  are  defined  as  follows. 

.  {J  |a(  *!.<*>  |  SA  contains  the  t-th  valid  outcome  J  .  (249) 

Here  the  symbol  {J  represents  the  set  theoretic  union,  each  Rt  in 

general  being  a  closed  domain  in  the  plane. 

To  show  that  we  have  defined  a  legitimate  system  of  90%  confidence 

regions,  we  suppose  that  the  true  parameter  point  Aq(  fi  ,  <Tq)  is  the 

point  <0  of  the  current  numerical  example,  and  show  that, 

given  that  the  outcome  of  the  experiment  is  a  valid  outcome,  the 

probability  is  at  least  .90  that  the  true  parameter  point,  A^yU^,  <71)» 

is  covered  by  the  confidence  region  Rfc  corresponding  to  the  outcome 

which  occurred.  We  recall  that  A^  is  an  arbitrary  representative 

admissible  parameter  point  and  so,  if  it  passes  the  test,  every 

admissible  parameter  point  A(/J,  0)  will  have  done  so. 

Suppose  that  A^(yu^,  0^)  is  the  true  parameter  point,  and  that 

the  outcome  is  a  valid  one.  The  table  of  probabilities  for  this 

example  (page  132)  shows  that,  with  probability  .93,  the  outcome  of 

the  experiment  will  be  one  of  the  following:  outcome  22,  26,  39,  46, 

51  or  77.  But  R^  or  confidence  region  22  contains  A^(/4^,  0^),  by 

Eq.  (249),  since  SA  contains  outcome  22  as  has  been  stated.  Similarly, 
A1 

the  point  A^  is  contained  in  R2g,  Rgg»  ...»  Ryy.  Hence  in  93%  of  all 
cases  in  which  A^(^,  0^)  is  the  true  parameter  point  and  in  which 
the  outcome  is  valid,  the  parameter  point  A1  is  covered  by  the 
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pertinent  confidence  region  Rfc.  To  put  it  another  way,  for  every 
1,000,000  such  cases  (true  parameter  point  and  valid  outcome), 
there  will  be  390,000  cases  in  which  outcome  46  is  the  result, 

260,000  cases  of  outcome  22,  ...,  20,000  cases  of  outcome  51,  for  a 
total  of  930,000  cases,  and  in  each  of  these  930,000  cases  the  parameter 
point  is  covered  by  the  pertinent  confidence  region  Rfc. 

Hence  the  required  test  is  passed  by  the  representative  admissible 
parameter  point  A^(  <T  ^) ,  and  therefore  by  all  such  admissible 
points,  and  it  has  been  shown  that  a  legitimate  system  of  90%  confidence 
regions  has  been  defined.  The  90%  confidence  level  was  taken  only 
as  an  example,  and  similar  reasoning  would  apply  to  95%  or  50%  confidence 
regions  or  those  at  any  other  confidence  level. 

We  wish  now  to  show  that  these  confidence  regions  are  arbitrarily 
small  for  a  sufficiently  large  number  of  stimuli,  by  showing  that  the 
sets  S.  as  defined  on  page  133,  which  determine  the  confidence  regions 

A 

Rfc  by  Eq.  (249),  consist  of  outcomes  whose  maximum  likelihood  points 
Ak(^k»^k^  are  arbitrarily  close  to  the  assumed  parameter  point 
A  (A*,#),  for  a  given  confidence  level  such  as  90%. 

A  heuristic  proof  has  been  worked  out,  indicating  that  these 
confidence  regions  are  arbitrarily  small  for  a  sufficiently  large 
number  of  stimuli,  by  showing  that  the  sets  as  defined  on  page  133, 
which  determine  the  confidence  regions  Rfc  by  Eq.  (249),  consist  of 

jtlk»^k)  are  arbitrarily 
close  to  the  assumed  parameter  point  A (fi  ,  <f)»  for  a  given  confidence 


outcomes  whose  maximum  likelihood  points  A^( 


level  such  as  90%. 


We  remark  that  this  property  of  small  confidence  regions  {&t} 
for  large  values  of  N  is  not  a  formal  requirement  of  the  system  of 
confidence  regions.  The  formal  requirement  is  embodied  in  Eq.  (249), 
for  90%  confidence  regions.  But  it  is  a  natural  and  desirable  property, 
and  one  which  is  possessed  by  the  asymptotic  large  sample  theory.  An 
experimenter  conducting  an  armor  plate  test  might  veil  ask  himself, 

"Why  should  1  fire  a  large  number  of  projectiles,  and  use  a  large 
number  of  specimens  of  the  plate,  unless  I  thereby  increase  the  accuracy 
of  my  knowledge  of  the  characteristics  of  the  plate?" 

The  proof  referred  to  is  not  completely  general  or  rigorous, 
in  that  only  a  subset  of  the  set  of  all  valid  outcomes  is  considered. 
Iforeover,  the  details  are  rather  lengthy.  Hence  we  merely  state  the 
result  here,  after  describing  the  subset  of  valid  outcomes  \diich  is 
used  and  the  general  nature  of  the  proof. 

In  the  proof,  a  population  mean  n  is  assumed, and  the  stimuli 
{c^}  are  spaced  at  equal  sub  intervals  in  the  interval 
The  exact  spacing  is  not  specified  in  advance,  and  the  principal  object 
of  the  proof  is  to  determine  how  fine  a  spacing  is  uecessary  in  order 
to  achieve  a  condition  which  will  be  described  shortly  regarding  the 
probability  content  of  subsets  of  the  outcomes. 

An  arbitrarily  small  positive  number  e  is  to  be  specified  (we 
can  think  of  the  value  of  e  as  being  under  the  control  of  an  imaginary 
opponent  who  is  trying  to  break  our  proof  down),  and  a  key  point  of 
the  proof  consists  of  showing  that,  however  small  €  may  be,  it  is 
possible  to  take  the  uniform  spacing  of  the  stimuli  so  fine  that  90% 


of  the  probability  content  of  all  valid  outcomes  considered  in  the 
proof  (for  907.  confidence  limits)  will  be  concentrated  in  the  subset 
which  has  its  maximum  likelihood  estimates  fl  in  the  interval 
(  H  -  €  ,fl  +  £). 

The  subset  of  the  set  of  all  possible  outcomes  which  we  consider 

in  this  heuristic  proof  is  the  set  of  those  of  the  form 

FF  ...  FFSFSS  ...  SS,  where  the  length  of  the  string  of  consecutive 

failures  is  not  necessarily  the  same  as  the  length  of  the  string  of 

consecutive  successes,  since  we  may  be  much  closer  to  one  end  than 

to  the  other  of  the  interval  (.fl  .  ,  fl  ).  Outcomes  of  this  form 

min’  max 

have  the  interlacing  required  by  Eq.  (35),  so  that  they  are  valid 
outcomes.  But  this  minimal  interlacing  is  of  small  significance  if 
N,  the  number  of  stimuli,  is  large,  so  that  for  practical  purposes 
we  can  say  that,  for  any  given  outcome  of  this  set,  we  have  failures 
below  a  certain  stimulus  level,  say  1025  ft. /sec.  in  the  armor  plate 
experiment,  and  successes  above  the  same  level. 

It  is  intuitively  clear  that,  for  an  outcome  of  the  form 
FF  ....  FFSFSS  ....  SS,  the  maximum  likelihood  estimator  fl  of  the 
point  A  (fl  tO  )  will  occur  at  approximately  the  stimulus  level  of  the 
interlaced  S  and  F;  for  in  that  case  the  factors  of  the  expression 
G(fl , O  )  (see  Eq.  (244))  will  consist  almost  entirely  of  factors  of  the 
form  pC^  -  fl  )/ 0  ]  with  a ^  >  fl  and  q[(b^  -  )/ 0  ]  with  b  <  fl  , 

both  types  of  factor  rapidly  approaching  the  value  1  as  a^  ■  fl  or 
fl  -  bj  becomes  large  compared  with  O  .  If  the  isolated  S  of  the 
sequence  occurs  for  c^  =  1096  ft. /sec.  and  the  isolated  F  for 
Cj^  =  1097,  we  can  assume  that  fl  for  this  outcome  is  approximately 
1096.5. 
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am 


The  object  is  to  shew  that  N  can  be  taken  sufficiently  large  so 


that  the  interval  (/*-€,/!  +  O  contains  at  least  90%  of  the 
probability  content  of  our  subset  of  the  set  of  all  valid  outcomes, 
for  90%  confidence  limits.  It  will  be  convenient  to  divide  the  interval 
(V  +  £  )  into  an  Integral  number,  L,  of  equal  subintervals,  and  to 
place  one  of  the  at  the  center  of  each  sub  interval.  Thus  ve 

have  the  situation  shown  in  Fig.  6,  in  which  several  of  the  {c^} 
near  H  +  £  are  shown,  and  the  S's  and  F's  near  fi  +  6  for  three 
of  the  outcomes  are  shown,  the  isolated  S  in  the  j-th  outcome  being  at 
Cj^,  etc.  By  what  was  said  above,  the  maximum  likelihood  estimates 
H  for  outcomes  j-1  and  j  are  approximately  fi  +  c  and  fi  +  £  +  (  €  /L) 
respectively,  € /L  being  the  distance  between  successive  stimuli. 
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Figure  6 
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The  mathematical  details  are  rather  lengthy  and  ve  do  not  give 
them  here,  but  we  state  the  result.  If  the  integer  L,  the  number  of 
subintervals  into  which  (//,  fl  +  €)  is  divided,  is  so  large  that 

L  ^  1  ~  [q<«  /<0/p<€/0]  ’  <2S0) 

where  p(t)  is  the  probability  integral,  Eqs.  (10),  (12), 

q(t)  *  1  -  p(t),  and  a  is  the  value  assumed  for  the  population 
standard  deviation,  then  the  sum  of  the  probabilities  of  the  outcomes 
whose  maximum  likelihood  estimates  H  are  in  the  interval  (fl  -€  ,fi+  €) 
is  more  than  90%  of  the  sum  of  the  probabilities  of  all  the  outcomes 
in  the  set  under  consideration,  as  described  on  page  137.  L  is  very 
large,  as  expected, if  e  is  small  compared  with  a  ,  for  in  this  case 
q  is  slightly  less  and  p  slightly  greater  than  1/2,  so  that  the 
denominator  on  the  right-hand  side  is  positive  but  very  small. 

Heuristically,  it  appears  very  probable  that  the  property,  which 
we’ have  demonstrated  for  a  chosen  subset  of  the  set  of  all  valid  outcomes, 
carries  over  to  the  entire  set.  This  is  the  following  property. 

Given  a  positive  constant  e  ,  however  small,  it  is  possible  to  take 
such  a  large  number  of  stimuli  that  the  maximum  likelihood  estimators 
{/i^}  for  the  set  S^,  as  in  the  table  on  page  132,  are  all  within  a 
distance  €  of  n  9  where  A(/(,  <x)  is  the  assumed  position  of  the  true 
parameter  point.  Since  this  has  the  effect  of  making  the  sets  { S^} 
small,  page  133,  it  also  makes  the  confidence  regions  {kt}  small. 


Eq.  (249).  It  is  true  that,  in  addition  to  other  respects  in  which 
the  foregoing  proof  fails  to  be  perfectly  general  and  rigorous,  it 
works  with  the  mart  mum  likelihood  estimates  {  H  but  not  with 
{  <f  ^ }  •  The  { <7  k  }  are  probably  all  very  small  for  our 
particular  subset  of  outcomes  of  the  form  FF  ....  FFSFSS  ....  SS,  and 
do  not  vary  over  a  known  range  as  the  estimators  {U^,}  do.  But 
this  proof,  it  is  felt,  does  make  a  significant  contribution  to  the 
intuitive  conviction  which  most  statisticians  will  already  have  in 
this  kind  of  situation,  to  the  effect  that  large  amounts  of  data,  or 
large  numbers  of  stimuli  in  the  experiments  with  which  we  are  concerned 
here,  do  have  the  effect  of  producing  small  confidence  regions. 
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